Ronca Realness: Voices that Sound the Sucia Body

This series listens to the political, gendered, queer(ed), racial engagements and class entanglements involved in proclaiming out loud: La-TIN-x. ChI-ca-NA. La-TI-ne. ChI-ca-n-@.  Xi-can-x. Funded by an Andrew W. Mellon Foundation as part of the Crossing Latinidades Humanities Research Initiative, the Latinx Sound Cultures Studies Working Group critically considers the role of sound and listening in our formation as political subjects. Through both a comparative and cross-regional lens, we invite Latinx Sound Scholars to join us as we dialogue about our place within the larger fields of Chicanx/Latinx Studies and Sound Studies. We are delighted to publish our initial musings with Sounding Out!, a forum that has long prioritized sound from a queered, racial, working-class and  “always-from-below” epistemological standpoint. —Ed. Dolores Inés Casillas

My Puerto Rican grandmother used to sing Pedro Infante’s “Las mañanitas” to all the women in the family on their birthdays, so naturally I grew up thinking this was a Puerto Rican song. Not quite – it’s Mexican. When my family came to New York City from Puerto Rico in the 1950s, they were starved of warm waters, mountains, and family members, but they were not starved of Spanish-language music and media thanks in large part to Mexico’s Golden Age of Cinema. In the Bronx, Puerto Ricans would go the theaters to watch movies like Nosotros los Pobres (1948),which popularized boleros like “Las Mañanitas.” This movie-going ritual in the wake of relocation and diaspora has provided the birthday soundtrack to my life. 

My mother grew up listening to her father sing boleros, and she would later sing with the Florida Grand Opera Chorus when I was a child. My early knowledge of opera came from her. Growing up in Miami Beach, I would also listen to reggaetón and hip-hop in afterschool programs. The Parks & Recreation department would host dances for us, and that was where I first learned to dance perreo. My early musical surroundings represent what it means to be a colonial subject, to hear the Italianate vocal legacies of opera mixed with the Afro-Diasporic and Indigenous rhythms of reggaetón. This post contextualizes my experience within bolero’s colonial history and legacy particularly its operatic disciplining of brown and Black bodies and voices. Reggaetóneras provide models for sonic subversion by being ronca, raspy, or breathy, and thus overriding internalized Eurocentric dichotomies of feminine and masculine vocal timbres.

When I began my own operatic training in college, I was constantly told to “purify” my voice, to resist vocal “fry,” and to handle my acid reflux by avoiding spicy foods. I was steered away from singing the pop songs I had grown up with, and kept many musical activities secret, like when I soloed for the tango ensemble and my a cappella group. In graduate school, thanks to my Latina roommates, I began listening to reggaetón again. I reunited with the voices that raised me and was reassured that their teachings of resistance would always present themselves when I needed them.

After 20 years of listening to Ivy, I have located the descriptor that most closely encapsulates the way her voice sounds to me: ronca. This is Spanish for hoarse, and in my experience, it’s been used colloquially, mostly by women, to describe moments when their throats might feel sore, and their voices sound raspy, or masculine, even. Ronca has been articulated as an epistemology of vocal sounding in the artistry of lower-class Black reggaetón creators like Don Omar and more recently, Ozuna. Sounding ronca is a signifier of realness, of truly knowing the struggle of race and class oppression. It is a vocalization of full-body rage fueled by poverty and colonization.

Ivy’s voice is so special to me because she sounds like my aunt when she’s had a long day, my mom when she’s yelling, and my grandma after years of having long days and yelling at people. She sounds like the raw, unfiltered power that comes from exhaustion. She sounds like inner will and justified fury. She sounds like yelling at landlords and ex-husbands for hot water and child support. She sounds like age. And she always has, even when she was “young.” And this sound is even more beautiful and life-giving to me after 4+ years in a classical voice program that told me it was bad to sound hoarse or raspy, surveilled my eating, and perpetuated the colonization of Native and Black peoples through musical subjugation.

Ivy Queen performing at Calibash 2012 in Los Angeles, California by Flickr User ElNene2k13 (CC BY 2.0)

Operatic training utilizes mechanisms that are opposite of what is “natural” for me as a poor Latina from the barrio. It asks me to lift my voice, clarify it, and feminize it. This, to me, is antithetical to the girl who laughs really loudly, gets raspy often from yelling and eating too many Takis, and loves to sing from her chest. Ivy’s voice empowers my place as the antithesis. Even as I sang classically in college, my voice was still often described as “soulful,” “hoarse,” “raspy,” “throaty.” My voice, although in a moment of attempted cleanup in college, was read as having previously engaged in genres that disrupt colonial dichotomies of “art” and “noise.” The sonic Blackness– in particular the exoticized and tropicalized Blackness of Latinidad in the U.S.- of my timbre was legible, and perhaps even hyper-audible, in moments when I was trying to adapt to European art forms. Raquel Z. Rivera asserts in New York Ricans from the Hip Hop Zone (2003) that Latinidad doesn’t take away from Blackness but adds an element of exoticism to the Blackness. Thus, I have come to understand ronca voices as representative of a Latina/e liberatory sonic and embodied praxis that resists the derogatory discourse around racialized voices predicated on European ideals of cleanliness.

The ronca voice is negotiating suciedad, Deborah Vargas’ analytic for how queers of color may reclaim their abject bodies and social spaces. Readings of my voice in predominantly white spaces were contextualized by my queer ambiguously-brown body, which in direct opposition to whitening regimes, was sounding suciedad. This is what ronca voices do, and what I conceptualize as “ronca realness”: the tendency of Latinas/es to not hide behind the voice but rather keep it real with the audience via their vocal timbre. Ronca voices sound another option to Barthes’ hegemonic article “The Grain of the Voice,” which has been applied to Ivy Queen and Don Omar in Jennifer Domino Rudolph’s “‘Roncamos Porque Podemos,’” and Dara Goldman’s “Walk like a Woman, Talk like a Man: Ivy Queen’s Troubling of Gender.” I intend ronca realness to be understood as a queer of color vocal analytic born from community and lived experience.

RaiNao’s Queer Suiciedad in San Juan, Still image by SO! from “Tentretiene”

Ronca voices reflect emotional states, flip colonial gendered vocal scripts, reveal if the singer had coffee that morning and Hot Cheetos the night before, and navigate tough musical contours with strain and stress; most importantly, they refuse to be white(ned). In college, my ronca realness was not always a choice. Keeping it real, in general, is sometimes undecided upon prior to the act of realness; it is an additional and deeply engrained responsibility that queer people of color have in white spaces to sound their dissent, or else face the continued exploitation of their communities. Further, these acts of realness may not even be legible as such but are often coded as bad behavior or an attitude problem.

Within communities of color and (im)migrant communities, it’s important to recognize that Ivy Queen’s ronca timbre was permissible because she was light-skinned, thin, and usually took on the masculine role of the rapper, rather than the feminine role of the dancer, in several of her videos. These privileges have left Afro-Latina ronca reggaetóneras like La Sista in the shadows.

La Sista has veered away from sounding ronca in recent years, but in her debut album, Majestad Negroide (2006), she praised Yoruba goddess Yemaya and Taino cacique Anacaona with a hoarse, raspy, bold sound. She is the Afro-Indigenous Latina many of us needed growing up, and her absence speaks to the ways in which Black ronca voices are policed and erased within Latinx culture and elsewhere. Let us praise her now.

Featured Image: Still image by SO! from RaiNao’s “Tentretiene”

Cloe Gentile Reyes (she/her) is a queer Boricua scholar, poet, and performer from Miami Beach. She is a soon-to-be Faculty Fellow in NYU’s Department of Music and earned her PhD in Musicology from UC Santa Barbara. Her writing explores how Caribbean femmes navigate intergenerational trauma and healing through decolonial sound, fashion, and dance. Cloe’s poems have been featured in the womanist magazine, Brown Sugar Lit, and she has presented and performed at PopCon, Society for American Music, International Association for the Study of Popular Music-US Branch, among several others. 

REWIND!…If you liked this post, you may also dig: 

Contra La Pared: Reggaetón and Dissonance in Naarm, MelbourneLucreccia Quintanilla

How Many Latinos are in this Motherfucking House?”: DJ Irene, Sonic Interpellations of Dissent and Queer Latinidad in ’90s Los Angeles—Eddy Francisco Alvarez Jr.

Unapologetic Paisa Chingona-ness: Listening to Fans’ Sonic Identities–Yessica Garcia Hernandez

SO! Podcast #74: Bonus Track for Spanish Rap & Sound Studies Forum

Cardi B: Bringing the Cold and Sexy to Hip Hop—Ashley Luthers

Ronca Realness: Voices that Sound the Sucia Body

This series listens to the political, gendered, queer(ed), racial engagements and class entanglements involved in proclaiming out loud: La-TIN-x. ChI-ca-NA. La-TI-ne. ChI-ca-n-@.  Xi-can-x. Funded by an Andrew W. Mellon Foundation as part of the Crossing Latinidades Humanities Research Initiative, the Latinx Sound Cultures Studies Working Group critically considers the role of sound and listening in our formation as political subjects. Through both a comparative and cross-regional lens, we invite Latinx Sound Scholars to join us as we dialogue about our place within the larger fields of Chicanx/Latinx Studies and Sound Studies. We are delighted to publish our initial musings with Sounding Out!, a forum that has long prioritized sound from a queered, racial, working-class and  “always-from-below” epistemological standpoint. —Ed. Dolores Inés Casillas

My Puerto Rican grandmother used to sing Pedro Infante’s “Las mañanitas” to all the women in the family on their birthdays, so naturally I grew up thinking this was a Puerto Rican song. Not quite – it’s Mexican. When my family came to New York City from Puerto Rico in the 1950s, they were starved of warm waters, mountains, and family members, but they were not starved of Spanish-language music and media thanks in large part to Mexico’s Golden Age of Cinema. In the Bronx, Puerto Ricans would go the theaters to watch movies like Nosotros los Pobres (1948),which popularized boleros like “Las Mañanitas.” This movie-going ritual in the wake of relocation and diaspora has provided the birthday soundtrack to my life. 

My mother grew up listening to her father sing boleros, and she would later sing with the Florida Grand Opera Chorus when I was a child. My early knowledge of opera came from her. Growing up in Miami Beach, I would also listen to reggaetón and hip-hop in afterschool programs. The Parks & Recreation department would host dances for us, and that was where I first learned to dance perreo. My early musical surroundings represent what it means to be a colonial subject, to hear the Italianate vocal legacies of opera mixed with the Afro-Diasporic and Indigenous rhythms of reggaetón. This post contextualizes my experience within bolero’s colonial history and legacy particularly its operatic disciplining of brown and Black bodies and voices. Reggaetóneras provide models for sonic subversion by being ronca, raspy, or breathy, and thus overriding internalized Eurocentric dichotomies of feminine and masculine vocal timbres.

When I began my own operatic training in college, I was constantly told to “purify” my voice, to resist vocal “fry,” and to handle my acid reflux by avoiding spicy foods. I was steered away from singing the pop songs I had grown up with, and kept many musical activities secret, like when I soloed for the tango ensemble and my a cappella group. In graduate school, thanks to my Latina roommates, I began listening to reggaetón again. I reunited with the voices that raised me and was reassured that their teachings of resistance would always present themselves when I needed them.

After 20 years of listening to Ivy, I have located the descriptor that most closely encapsulates the way her voice sounds to me: ronca. This is Spanish for hoarse, and in my experience, it’s been used colloquially, mostly by women, to describe moments when their throats might feel sore, and their voices sound raspy, or masculine, even. Ronca has been articulated as an epistemology of vocal sounding in the artistry of lower-class Black reggaetón creators like Don Omar and more recently, Ozuna. Sounding ronca is a signifier of realness, of truly knowing the struggle of race and class oppression. It is a vocalization of full-body rage fueled by poverty and colonization.

Ivy’s voice is so special to me because she sounds like my aunt when she’s had a long day, my mom when she’s yelling, and my grandma after years of having long days and yelling at people. She sounds like the raw, unfiltered power that comes from exhaustion. She sounds like inner will and justified fury. She sounds like yelling at landlords and ex-husbands for hot water and child support. She sounds like age. And she always has, even when she was “young.” And this sound is even more beautiful and life-giving to me after 4+ years in a classical voice program that told me it was bad to sound hoarse or raspy, surveilled my eating, and perpetuated the colonization of Native and Black peoples through musical subjugation.

Ivy Queen performing at Calibash 2012 in Los Angeles, California by Flickr User ElNene2k13 (CC BY 2.0)

Operatic training utilizes mechanisms that are opposite of what is “natural” for me as a poor Latina from the barrio. It asks me to lift my voice, clarify it, and feminize it. This, to me, is antithetical to the girl who laughs really loudly, gets raspy often from yelling and eating too many Takis, and loves to sing from her chest. Ivy’s voice empowers my place as the antithesis. Even as I sang classically in college, my voice was still often described as “soulful,” “hoarse,” “raspy,” “throaty.” My voice, although in a moment of attempted cleanup in college, was read as having previously engaged in genres that disrupt colonial dichotomies of “art” and “noise.” The sonic Blackness– in particular the exoticized and tropicalized Blackness of Latinidad in the U.S.- of my timbre was legible, and perhaps even hyper-audible, in moments when I was trying to adapt to European art forms. Raquel Z. Rivera asserts in New York Ricans from the Hip Hop Zone (2003) that Latinidad doesn’t take away from Blackness but adds an element of exoticism to the Blackness. Thus, I have come to understand ronca voices as representative of a Latina/e liberatory sonic and embodied praxis that resists the derogatory discourse around racialized voices predicated on European ideals of cleanliness.

The ronca voice is negotiating suciedad, Deborah Vargas’ analytic for how queers of color may reclaim their abject bodies and social spaces. Readings of my voice in predominantly white spaces were contextualized by my queer ambiguously-brown body, which in direct opposition to whitening regimes, was sounding suciedad. This is what ronca voices do, and what I conceptualize as “ronca realness”: the tendency of Latinas/es to not hide behind the voice but rather keep it real with the audience via their vocal timbre. Ronca voices sound another option to Barthes’ hegemonic article “The Grain of the Voice,” which has been applied to Ivy Queen and Don Omar in Jennifer Domino Rudolph’s “‘Roncamos Porque Podemos,’” and Dara Goldman’s “Walk like a Woman, Talk like a Man: Ivy Queen’s Troubling of Gender.” I intend ronca realness to be understood as a queer of color vocal analytic born from community and lived experience.

RaiNao’s Queer Suiciedad in San Juan, Still image by SO! from “Tentretiene”

Ronca voices reflect emotional states, flip colonial gendered vocal scripts, reveal if the singer had coffee that morning and Hot Cheetos the night before, and navigate tough musical contours with strain and stress; most importantly, they refuse to be white(ned). In college, my ronca realness was not always a choice. Keeping it real, in general, is sometimes undecided upon prior to the act of realness; it is an additional and deeply engrained responsibility that queer people of color have in white spaces to sound their dissent, or else face the continued exploitation of their communities. Further, these acts of realness may not even be legible as such but are often coded as bad behavior or an attitude problem.

Within communities of color and (im)migrant communities, it’s important to recognize that Ivy Queen’s ronca timbre was permissible because she was light-skinned, thin, and usually took on the masculine role of the rapper, rather than the feminine role of the dancer, in several of her videos. These privileges have left Afro-Latina ronca reggaetóneras like La Sista in the shadows.

La Sista has veered away from sounding ronca in recent years, but in her debut album, Majestad Negroide (2006), she praised Yoruba goddess Yemaya and Taino cacique Anacaona with a hoarse, raspy, bold sound. She is the Afro-Indigenous Latina many of us needed growing up, and her absence speaks to the ways in which Black ronca voices are policed and erased within Latinx culture and elsewhere. Let us praise her now.

Featured Image: Still image by SO! from RaiNao’s “Tentretiene”

Cloe Gentile Reyes (she/her) is a queer Boricua scholar, poet, and performer from Miami Beach. She is a soon-to-be Faculty Fellow in NYU’s Department of Music and earned her PhD in Musicology from UC Santa Barbara. Her writing explores how Caribbean femmes navigate intergenerational trauma and healing through decolonial sound, fashion, and dance. Cloe’s poems have been featured in the womanist magazine, Brown Sugar Lit, and she has presented and performed at PopCon, Society for American Music, International Association for the Study of Popular Music-US Branch, among several others. 

REWIND!…If you liked this post, you may also dig: 

Contra La Pared: Reggaetón and Dissonance in Naarm, MelbourneLucreccia Quintanilla

How Many Latinos are in this Motherfucking House?”: DJ Irene, Sonic Interpellations of Dissent and Queer Latinidad in ’90s Los Angeles—Eddy Francisco Alvarez Jr.

Unapologetic Paisa Chingona-ness: Listening to Fans’ Sonic Identities–Yessica Garcia Hernandez

SO! Podcast #74: Bonus Track for Spanish Rap & Sound Studies Forum

Cardi B: Bringing the Cold and Sexy to Hip Hop—Ashley Luthers

Out Now: Log Out – A Glossary of Technological Resistance and Decentralization

This book, edited by Valeria Ferrari, Florian Idelberger, Andrea Leiter, Morshed Mannan, María-Cruz Valiente, Balázs Bodó, brings together voices from various fields of intellectual inquiry, based on the idea that technological, legal and societal aspects of the information sphere are interlinked and co-dependent from each other. In order to tackle the existing gap in shared semantics, this glossary converges the efforts of experts from various disciplines to build a shared vocabulary on the social, technical, economic, political aspects of decentralised, distributed or sovereign technologies: artefacts which seek to challenge the techno-social status quo by, for example, circumventing law enforcement, resisting surveillance, or being participative.

The idea ofthis glossary arose from the need for a workable, flexible and multidisciplinary resource for terminological clarity, which reflects instead of denying complexity. Situating the terms emerging through technology development in the wider context of multidisciplinary scientific, policy and political discourses, this glossary provides a conceptual toolkit for the study of the various political, economic, legal and technical struggles that decentralised, encryption-based, peer-to-peer technologies bring about and go through.

Choosing relevant technology-related terms and understanding them is to investigate their affordances within a given ecosystem of actors, discourses and systems of incentives. This requires an interdisciplinary, multi-layered approach that is attentive to the interlinkages between technological design nuances and socio-political, economic implications.

The glossary was envisioned as a long-term collaborative project, and as a work-in-progress, as new entries are periodically added over time. The present book collects the entries published on the Internet Policy Review between 2021 and 2023. Therefore, it represents the first volume of what hopefully will be a long-term, ever-evolving editorial collaboration, whose sources of inspiration and goals evolve with the evolving of the broader discussions on decentralized technologies.

Read more about the book, order a free copy, or download the .pdf here

The Cyborg’s Prosody, or Speech AI and the Displacement of Feeling

Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

In summer 2021, sound artist, engineer, musician, and educator Johann Diedrick convened a panel at the intersection of racial bias, listening, and AI technology at Pioneerworks in Brooklyn, NY. Diedrick, 2021 Mozilla Creative Media award recipient and creator of such works as Dark Matters, is currently working on identifying the origins of racial bias in voice interface systems. Dark Matters, according to Squeaky Wheel, “exposes the absence of Black speech in the datasets used to train voice interface systems in consumer artificial intelligence products such as Alexa and Siri. Utilizing 3D modeling, sound, and storytelling, the project challenges our communities to grapple with racism and inequity through speech and the spoken word, and how AI systems underserve Black communities.” And now, he’s working with SO! as guest editor for this series (along with ed-in-chief JS!). It kicked off with Amina Abbas-Nazari’s post, helping us to understand how Speech AI systems operate from a very limiting set of assumptions about the human voice. Then, Golden Owens took a deep historical dive into the racialized sound of servitude in America and how this impacts Intelligent Virtual Assistants. Last week, Michelle Pfeifer explored how some nations are attempting to draw sonic borders, despite the fact that voices are not passports. Today, Dorothy R. Santos wraps up the series with a meditation on what we lose due to the intensified surveilling, tracking, and modulation of our voices. [To read the full series, click here–JS

Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

In 2010, science fiction writer Charles Yu wrote a story titled “Standard Loneliness Package,” where emotions are outsourced to another human being. While Yu’s story is a literal depiction, albeit fictitious, of what might be entailed and the considerations that need to be made of emotional labor, it was published a year prior to Apple introducing Siri as its official voice assistant for the iPhone. Humans are not meant to be viewed as a type of technology, yet capitalist and neoliberal logics continue to turn to technology as a solution to erase or filter what is least desirable even if that means the literal modification of voice, accent, and language. What do these actions do to the body at risk of severe fragmentation and compartmentalization?

I weep.

I wail.

I gnash my teeth.

Underneath it all, I am smiling. I am giggling.

I am at a funeral. My client’s heart aches, and inside of it is my heart, not aching, the opposite of aching—doing that, whatever it is.

 Charles Yu, “Standard Loneliness Package,” Lightspeed: Science Fiction & Fantasy, November 2010

Yu sets the scene by providing specific examples of feelings of pain and loss that might be handed off to an agent who absorbs the feelings. He shows us, in one way, what a world might look and feel like if we were to go to the extreme of eradicating and off loading our most vulnerable moments to an agent or technician meant to take on this labor. Although written well over a decade ago, its prescient take on the future of feelings wasn’t too far off from where we find ourselves in 2023. How does the voice play into these connections between Yu’s story and what we’re facing in the technological age of voice recognition, speech synthesis, and assistive technologies? How might we re-imagine having the choice to displace our burdens onto another being or entity? Taking a cue from Yu’s story, technologies are being created that pull at the heartstrings of our memories and nostalgia. Yet what happens when we are thrust into a perpetual state of grieving and loss?

Humans are made to forget. Unlike a computer, we are fed information required for our survival. When it comes to language and expression, it is often a stochastic process of figuring out for whom we speak and who is on the receiving end of our communication and speech.  Artist and scholar Fabiola Hanna believes polyvocality necessitates an active and engaged listener, which then produces our memories. Machines have become the listeners to our sonic landscapes as well as capturers, surveyors, and documents of our utterances.

A Call Center, 1 December 2014, by Abmpublicidad (CC BY-SA 4.0)

The past few years may have been a remarkable advancement in voice tech with companies such as Amazon and Sanas AI, a voice recognition platform that allows a user to apply a vocal filter onto any human voice, with a discernible accent, that transforms the speech into Standard American English. Yet their hopes for accent elimination and voice mimicry foreshadow a future of design without justice and software development sans cultural and societal considerations, something I work through in my artwork in progress, The Cyborg’s Prosody (2022-present).

The Cyborg’s Prosody is an interactive web-based artwork (optimized for mobile) that requires participants to read five vignettes that increasingly incorporate Tagalog words and phrases that must be repeated by the player. The work serves as a type of parody, as an “accent induction school” — providing a decolonial method of exploring how language and accents are learned and preserved. The work is a response to the creation of accent reduction schools and coaches in the Philippines. Originally, the work was meant to be a satire and parody of these types of services, but shifted into a docu-poetic work of my mother’s immigration story and learning and becoming fluent in American English.

Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

Even though English is a compulsory language in the Philippines, it is a language learned within the parameters of an educational institution and not common speech outside of schools and businesses. From the call center agents hired at Vox Elite, a BPO company based in the Philippines, to a Filipino immigrant navigating her way through a new environment, the embodiment of language became apparent throughout the stages of research and the creative interventions of the past few years.

In Fall 2022, I gave an artist talk about The Cyborg’s Prosody to a room of predominantly older, white, cisgender male engineers and computer scientists. Apparently, my work caused a stir in one of the conversations between a small group of attendees. A couple of the engineers chose to not address me directly, but I overheard a debate between guests with one of the engineers asking, “What is her project supposed to teach me about prosody? What does mimicking her mom teach me?” He became offended by the prospect of a work that de-centered his language, accent, and what was most familiar to him.The Cyborg’s Prosody is a reversal of what is perceived as a foreign accented voice in the United States into a performance for both the cyborg and the player. I introduce the term western vocal drag to convey the caricature of gender through drag performance, which is apropos and akin to the vocal affect many non-western speakers effectuate in their speech.

The concept of western vocal drag became a way for me to understand and contemplate the ways that language becomes performative through its embodiment. Whether it is learning American vernacular to the complex tenses that give meaning to speech acts, there is always a failure or queering of language when a particular affect and accent is emphasized in one’s speech. The delivery of speech acts is contingent upon setting, cultural context, and whether or not there is a type of transaction occurring between the speaker and listener. In terms of enhancement of speech and accent to conform to a dominant language in the workplace and in relation to global linguistic capitalism, scholar Vijay A. Ramjattan states in that there is no such thing as accent elimination or even reduction. Rather, an accent is modified. The stakes are high when taking into consideration the marketing and branding of software such as Sanas AI that proposes an erasure of non-dominant foreign accented voices.

The biggest fear related to the use of artificial intelligence within voice recognition and speech technologies is the return to a Standard American English (and accent) preferred by a general public that ceases to address, acknowledge, and care about linguistic diversity and inclusion. The technology itself has been marketed as a way for corporations and the BPO companies they hire to mind the mental health of the call center agents subjected to racism and xenophobia just by the mere sound of their voice and accent. The challenge, moving forward, is reversing the need to serve the western world.

A transorality or vocality presents itself when thinking about scholar April Baker-Bell’s work Black Linguistic Consciousness. When Black youth are taught and required to speak with what is considered Standard American English, this presents a type of disciplining that perpetuates raciolinguistic ideologies of what is acceptable speech. Baker-Bell focuses on an antiracist linguistic pedagogy where Black youth are encouraged to express themselves as a shift towards understanding linguistic bias. Deeply inspired by her scholarship, I started to wonder about the process for working on how to begin framing language learning in terms of a multi-consciousness that includes cultural context and affect as a way to bridge gaps in understanding. 

Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

Or, let’s re-think this concept or idea that a bad version of English exists. As Cathy Park Hong brilliantly states, “Bad English is my heritage…To other English is to make audible the imperial power sewn into the language, to slit English open so its dark histories slide out.” It is necessary for us all to reconfigure our perceptions of how we listen and communicate that perpetuates seeking familiarity and agreement, but encourages respecting and honoring our differences.

Featured Image: Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

Dorothy R. Santos, Ph.D. (she/they) is a Filipino American storyteller, poet, artist, and scholar whose academic and research interests include feminist media histories, critical medical anthropology, computational media, technology, race, and ethics. She has her Ph.D. in Film and Digital Media with a designated emphasis in Computational Media from the University of California, Santa Cruz and was a Eugene V. Cota-Robles fellow. She received her Master’s degree in Visual and Critical Studies at the California College of the Arts and holds Bachelor’s degrees in Philosophy and Psychology from the University of San Francisco. Her work has been exhibited at Ars Electronica, Rewire Festival, Fort Mason Center for Arts & Culture, Yerba Buena Center for the Arts, and the GLBT Historical Society.

Her writing appears in art21, Art in America, Ars Technica, Hyperallergic, Rhizome, Slate, and Vice Motherboard. Her essay “Materiality to Machines: Manufacturing the Organic and Hypotheses for Future Imaginings,” was published in The Routledge Companion to Biology in Art and Architecture. She is a co-founder of REFRESH, a politically-engaged art and curatorial collective and serves as a member of the Board of Directors for the Processing Foundation. In 2022, she received the Mozilla Creative Media Award for her interactive, docu-poetics work The Cyborg’s Prosody (2022). She serves as an advisory board member for POWRPLNT, slash arts, and House of Alegria.

tape-reel

REWIND! . . .If you liked this post, you may also dig:

Your Voice is (Not) Your PassportMichelle Pfeifer 

“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens

Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari 

Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso

The Sound of What Becomes Possible: Language Politics and Jesse Chun’s 술래 SULLAE (2020)Casey Mecija

Look Who’s Talking, Y’all: Dr. Phil, Vocal Accent and the Politics of Sounding White–Christie Zwahlen

Listening to Modern Family’s Accent–Inés Casillas and Sebastian Ferrada

The Cyborg’s Prosody, or Speech AI and the Displacement of Feeling

Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

In summer 2021, sound artist, engineer, musician, and educator Johann Diedrick convened a panel at the intersection of racial bias, listening, and AI technology at Pioneerworks in Brooklyn, NY. Diedrick, 2021 Mozilla Creative Media award recipient and creator of such works as Dark Matters, is currently working on identifying the origins of racial bias in voice interface systems. Dark Matters, according to Squeaky Wheel, “exposes the absence of Black speech in the datasets used to train voice interface systems in consumer artificial intelligence products such as Alexa and Siri. Utilizing 3D modeling, sound, and storytelling, the project challenges our communities to grapple with racism and inequity through speech and the spoken word, and how AI systems underserve Black communities.” And now, he’s working with SO! as guest editor for this series (along with ed-in-chief JS!). It kicked off with Amina Abbas-Nazari’s post, helping us to understand how Speech AI systems operate from a very limiting set of assumptions about the human voice. Then, Golden Owens took a deep historical dive into the racialized sound of servitude in America and how this impacts Intelligent Virtual Assistants. Last week, Michelle Pfeifer explored how some nations are attempting to draw sonic borders, despite the fact that voices are not passports. Today, Dorothy R. Santos wraps up the series with a meditation on what we lose due to the intensified surveilling, tracking, and modulation of our voices. [To read the full series, click here–JS

Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

In 2010, science fiction writer Charles Yu wrote a story titled “Standard Loneliness Package,” where emotions are outsourced to another human being. While Yu’s story is a literal depiction, albeit fictitious, of what might be entailed and the considerations that need to be made of emotional labor, it was published a year prior to Apple introducing Siri as its official voice assistant for the iPhone. Humans are not meant to be viewed as a type of technology, yet capitalist and neoliberal logics continue to turn to technology as a solution to erase or filter what is least desirable even if that means the literal modification of voice, accent, and language. What do these actions do to the body at risk of severe fragmentation and compartmentalization?

I weep.

I wail.

I gnash my teeth.

Underneath it all, I am smiling. I am giggling.

I am at a funeral. My client’s heart aches, and inside of it is my heart, not aching, the opposite of aching—doing that, whatever it is.

 Charles Yu, “Standard Loneliness Package,” Lightspeed: Science Fiction & Fantasy, November 2010

Yu sets the scene by providing specific examples of feelings of pain and loss that might be handed off to an agent who absorbs the feelings. He shows us, in one way, what a world might look and feel like if we were to go to the extreme of eradicating and off loading our most vulnerable moments to an agent or technician meant to take on this labor. Although written well over a decade ago, its prescient take on the future of feelings wasn’t too far off from where we find ourselves in 2023. How does the voice play into these connections between Yu’s story and what we’re facing in the technological age of voice recognition, speech synthesis, and assistive technologies? How might we re-imagine having the choice to displace our burdens onto another being or entity? Taking a cue from Yu’s story, technologies are being created that pull at the heartstrings of our memories and nostalgia. Yet what happens when we are thrust into a perpetual state of grieving and loss?

Humans are made to forget. Unlike a computer, we are fed information required for our survival. When it comes to language and expression, it is often a stochastic process of figuring out for whom we speak and who is on the receiving end of our communication and speech.  Artist and scholar Fabiola Hanna believes polyvocality necessitates an active and engaged listener, which then produces our memories. Machines have become the listeners to our sonic landscapes as well as capturers, surveyors, and documents of our utterances.

A Call Center, 1 December 2014, by Abmpublicidad (CC BY-SA 4.0)

The past few years may have been a remarkable advancement in voice tech with companies such as Amazon and Sanas AI, a voice recognition platform that allows a user to apply a vocal filter onto any human voice, with a discernible accent, that transforms the speech into Standard American English. Yet their hopes for accent elimination and voice mimicry foreshadow a future of design without justice and software development sans cultural and societal considerations, something I work through in my artwork in progress, The Cyborg’s Prosody (2022-present).

The Cyborg’s Prosody is an interactive web-based artwork (optimized for mobile) that requires participants to read five vignettes that increasingly incorporate Tagalog words and phrases that must be repeated by the player. The work serves as a type of parody, as an “accent induction school” — providing a decolonial method of exploring how language and accents are learned and preserved. The work is a response to the creation of accent reduction schools and coaches in the Philippines. Originally, the work was meant to be a satire and parody of these types of services, but shifted into a docu-poetic work of my mother’s immigration story and learning and becoming fluent in American English.

Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

Even though English is a compulsory language in the Philippines, it is a language learned within the parameters of an educational institution and not common speech outside of schools and businesses. From the call center agents hired at Vox Elite, a BPO company based in the Philippines, to a Filipino immigrant navigating her way through a new environment, the embodiment of language became apparent throughout the stages of research and the creative interventions of the past few years.

In Fall 2022, I gave an artist talk about The Cyborg’s Prosody to a room of predominantly older, white, cisgender male engineers and computer scientists. Apparently, my work caused a stir in one of the conversations between a small group of attendees. A couple of the engineers chose to not address me directly, but I overheard a debate between guests with one of the engineers asking, “What is her project supposed to teach me about prosody? What does mimicking her mom teach me?” He became offended by the prospect of a work that de-centered his language, accent, and what was most familiar to him.The Cyborg’s Prosody is a reversal of what is perceived as a foreign accented voice in the United States into a performance for both the cyborg and the player. I introduce the term western vocal drag to convey the caricature of gender through drag performance, which is apropos and akin to the vocal affect many non-western speakers effectuate in their speech.

The concept of western vocal drag became a way for me to understand and contemplate the ways that language becomes performative through its embodiment. Whether it is learning American vernacular to the complex tenses that give meaning to speech acts, there is always a failure or queering of language when a particular affect and accent is emphasized in one’s speech. The delivery of speech acts is contingent upon setting, cultural context, and whether or not there is a type of transaction occurring between the speaker and listener. In terms of enhancement of speech and accent to conform to a dominant language in the workplace and in relation to global linguistic capitalism, scholar Vijay A. Ramjattan states in that there is no such thing as accent elimination or even reduction. Rather, an accent is modified. The stakes are high when taking into consideration the marketing and branding of software such as Sanas AI that proposes an erasure of non-dominant foreign accented voices.

The biggest fear related to the use of artificial intelligence within voice recognition and speech technologies is the return to a Standard American English (and accent) preferred by a general public that ceases to address, acknowledge, and care about linguistic diversity and inclusion. The technology itself has been marketed as a way for corporations and the BPO companies they hire to mind the mental health of the call center agents subjected to racism and xenophobia just by the mere sound of their voice and accent. The challenge, moving forward, is reversing the need to serve the western world.

A transorality or vocality presents itself when thinking about scholar April Baker-Bell’s work Black Linguistic Consciousness. When Black youth are taught and required to speak with what is considered Standard American English, this presents a type of disciplining that perpetuates raciolinguistic ideologies of what is acceptable speech. Baker-Bell focuses on an antiracist linguistic pedagogy where Black youth are encouraged to express themselves as a shift towards understanding linguistic bias. Deeply inspired by her scholarship, I started to wonder about the process for working on how to begin framing language learning in terms of a multi-consciousness that includes cultural context and affect as a way to bridge gaps in understanding. 

Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

Or, let’s re-think this concept or idea that a bad version of English exists. As Cathy Park Hong brilliantly states, “Bad English is my heritage…To other English is to make audible the imperial power sewn into the language, to slit English open so its dark histories slide out.” It is necessary for us all to reconfigure our perceptions of how we listen and communicate that perpetuates seeking familiarity and agreement, but encourages respecting and honoring our differences.

Featured Image: Still from artist’s mock-up of The Cyborg’s Prosody(2022-present), copyright Dorothy R. Santos

Dorothy R. Santos, Ph.D. (she/they) is a Filipino American storyteller, poet, artist, and scholar whose academic and research interests include feminist media histories, critical medical anthropology, computational media, technology, race, and ethics. She has her Ph.D. in Film and Digital Media with a designated emphasis in Computational Media from the University of California, Santa Cruz and was a Eugene V. Cota-Robles fellow. She received her Master’s degree in Visual and Critical Studies at the California College of the Arts and holds Bachelor’s degrees in Philosophy and Psychology from the University of San Francisco. Her work has been exhibited at Ars Electronica, Rewire Festival, Fort Mason Center for Arts & Culture, Yerba Buena Center for the Arts, and the GLBT Historical Society.

Her writing appears in art21, Art in America, Ars Technica, Hyperallergic, Rhizome, Slate, and Vice Motherboard. Her essay “Materiality to Machines: Manufacturing the Organic and Hypotheses for Future Imaginings,” was published in The Routledge Companion to Biology in Art and Architecture. She is a co-founder of REFRESH, a politically-engaged art and curatorial collective and serves as a member of the Board of Directors for the Processing Foundation. In 2022, she received the Mozilla Creative Media Award for her interactive, docu-poetics work The Cyborg’s Prosody (2022). She serves as an advisory board member for POWRPLNT, slash arts, and House of Alegria.

tape-reel

REWIND! . . .If you liked this post, you may also dig:

Your Voice is (Not) Your PassportMichelle Pfeifer 

“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens

Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari 

Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso

The Sound of What Becomes Possible: Language Politics and Jesse Chun’s 술래 SULLAE (2020)Casey Mecija

Look Who’s Talking, Y’all: Dr. Phil, Vocal Accent and the Politics of Sounding White–Christie Zwahlen

Listening to Modern Family’s Accent–Inés Casillas and Sebastian Ferrada

Les traumatismes routiers en Afrique de l’Ouest : l’épidémie oubliée

Sous la direction d’Emmanuel Bonnet et Aude Nikiema

Pour accéder au livre en version html, cliquez ici.
Pour télécharger le PDF, cliquez ici.

Cet ouvrage collectif présente les analyses et les expériences scientifiques de plusieurs chercheurs et chercheuses sur les traumatismes routiers en Afrique. L’ambition de l’ouvrage est de rassembler en langue française les rares connaissances produites sur ce sujet en Afrique. Les thèmes sont variés et convergent vers trois axes principaux, celui de l’amélioration des données sur les accidents de la route, de l’enjeu de santé publique que constituent les traumatismes routiers et enfin de l’importance du transfert de connaissances pour aider à élaborer des politiques de sécurité routière adaptées aux contextes des pays. Les traumatismes routiers constituent aujourd’hui une épidémie oubliée sur le continent qui devra pourtant être maitrisée si les États veulent atteindre l’une des cibles des objectifs pour le développement durable consacrée à la réduction de moitié des blessé·e·s et des décès sur les routes.

ISBN pour l’impression : 978-2-925128-25-0

ISBN pour le PDF : 978-2-925128-26-7

DOI : 10.5281/zenodo.8114953

352 pages

Design de la couverture : Kate McDonnell, dessin de Glez, pour FASeR – ICI – Santé

Date de publication : juin 2023

***

In memoriam

Préface 1 – Professeur Nicolas Meda, Ancien ministre de la santé du Burkina Faso

Préface 2 – Tidjane Amadou Kamagaté

I. Améliorer la production de données pour mieux agir et réduire les accidents de la route

3. « Mon droit de marcher, mon droit de vivre ». Mortalité des piétons, routes et caractéristiques environnementales au Bénin – Y. Glèlè Ahanhanzo, D. Kpozèhouen, J. C. Sossa, Ghislain E. Sopoh, H. Tedji, K. Yete, A. Levêque

4. Performance du système d’information sanitaire de routine dans la surveillance des traumatismes par accident de la route au Bénin – D. Kpozèhouen, Y. Glèlè Ahanhanzo, G. E. Sopoh, A. Kpozèhouen, C. Azandjèmè, A. Levêque

II. Les traumatismes de la route : un enjeu de santé publique négligé

6. Analyse situationnelle de la prise en charge des victimes de la route au Burkina Faso : un défi pour atteindre les objectifs de développement durable – J.-B. Guiard-Schmid, T. Comte, S. A. Ouattara, S. Gandema, A. B. Tapsoba, Y. L. Bambara, E. Bonnet

7. Situation de handicap et facteurs associés chez les victimes d’accidents de la route au Bénin. Étude dans cinq hôpitaux publics et confessionnels en zone urbaine et périurbaine – Y. Glèlè-Ahanhanzo, A. Kpozèhouen, N. M. Paraïso, P. Makoutodé, Chabi O. Alphonse Biaou, E. Remacle, E.-M. Ouendo, A. Levêque

10. Accident de la route par transport public : analyse à partir d’un cas d’accident d’autocar au Bénin en 2019 – Y. Glèlè Ahanhanzo, D. Daddah, A. Kpozèhouen, B. Hounkpè Dos Santos, K. Quenum, M. Bato, A. Levêque

III. Diffuser les connaissances pour changer les comportements et les politiques de sécurité routière

Dire le vrai. Perspectives situées

Sous la direction de Gilbert Willy Tio Babena

Pour accéder au livre en version html, cliquez ici.
Pour télécharger le PDF, cliquez ici.

« Le vrai, la vérité, c’est ce à quoi tendent tant les dénonciateurs que les whistleblowers, les gens à ‘‘franc-parler’’ et ceux qui pratiquent l’outing, les ‘‘rapporteurs et rapporteuses’’ de cours de récréation et les confessés des confessionnaux. Ces diseurs et diseuses de vérité, que tout sépare, ont cependant un objectif commun : révéler des informations, passer du caché au su, débusquer les secrets, les insus, les clandestinités. Autour de ces thématiques, mille choses, de toutes époques et de toutes cultures, pourraient être dites, et écrites », écrit Marie-Anne Paveau dans son appel d’œuvres de l’esprit pour meubler les murs de La Villa réflexive. Le présent ouvrage est le produit de cette expérimentation épistémologique sur la thématique du dire le vrai. Liés mais tout aussi indépendants les uns des autres, les chapitres sont écrits dans un style décolonial qui brise les codes de l’écriture positiviste pour proposer au lectorat un regard réflexif sur la vérité à partir de perspectives situées. Pour élargir l’horizon, le volume offre une pluralité de ressources multimodales (liens hypertextes, images, QR codes de vidéos et audios) qui peuvent être consultées aussi bien dans les versions numériques que papier.

Un livre de la collection Réflexivités et expérimentations épistémologiques

ISBN pour l’impression : 978-2-925128-30-4

ISBN pour le PDF : 978-2-925128-29-8

DOI : 10.5281/zenodo.8139735

319 pages

Design de la couverture : Kate McDonnell

Date de publication : 2023

Cet ouvrage est publié avec le soutien de la Faculté des Arts, Lettres et Sciences Humaines de l’Université de Maroua.

***

In memoriam

Ouvrir la porte de la vérité discursive – Gilbert Willy Tio Babena

I. Qu’est-ce que la vérité? Qui dit vrai et comment?
III. Les voiles ou les falsifications de la vérité

Machine Anxiety or Why I Should Close TikTok–But Don’t

It’s Wednesday evening and I have no specific plans. I’m chilling on the sofa, scrolling through TikTok and Instagram while Below Deck Sailing Yacht is playing in the background. Even though I’m not moving I’m tired and bored. I have multiple screens open, trying to distract me, with no plans and no people available to save me from myself. It’s just me, my apps–and my algorithms.

Suddenly I receive an alert from my period tracker telling me that most likely, it will come on Friday, which explains why I feel so tired and why I keep craving bread even though I’m gluten intolerant. Unsurprisingly, Instagram and TikTok know this as well. Comfy clothing ads, vegan Ben & Jerry’s, pilates against general inflammation, Beyonce’s Renaissance tour, carrot salad for hormonal balance, retinol, Margiela/Ann Demeulemeester/Rick Owens/The Row clean girl aesthetic mixed with ‘you probably have autism’ videos.

An Uber Eats pop-up reminds me that I can order Indian instead of cooking. 40-50 min away. Great, butter chicken it is.

I recently heard someone talk about how there’s an epidemic of people who are self-diagnosing with Autism and ADHD, yes, exhibiting symptoms of it, but also maybe it’s just the dopamine burnout caused by the same apps that made them self-diagnose in the first place. I wonder how much this might be true. Are these algorithms so good at analysing our behaviour that they end up reflecting them back to us in a digested 20 seconds video that allows us to identify things in ourselves that we weren’t aware of? Or are we consuming this content at such a large and quick rate that we end up becoming what they predict us to be? In other words, are we fulfilling their prophesies or do they know us better than we know ourselves?

Did I really want those Tabi ballet flats, or did the algorithm make me buy them? Do I have ADHD, or am I experiencing dopamine burnout? Am I having a style crisis because I am an evolving human being or because the algorithm keeps pushing me into the clean girl aesthetic while also wanting me to lean into Y2K and Rick Owens vibe but also learn how to wear a fucking hair clip correctly because that’s what the Copenhagen influenced Amsterdam girlies are into? Am I ready to move into a cabin in the woods and live my girl moss dreams or go clubbing in Berghain, pluck my eyebrows to death and bleach my hair? Is my stomach hurting because all of this is going through my head and my screens (yes, multiple) at the same time? Because you can have it all girl, you go girl, work-life balance girl, celery juice girl. Or do I have that rare, incurable undiagnosed disease the algorithm told me to google on web MD?

Am I going blind and need glasses, or should I just listen to my mom and stare at the distance for at least 10 minutes every hour?

When I was younger, things seemed easier but also a lot more serious. Now things seem unserious and a lot more complicated. Nothing is that important anymore, but everything seems to have a thousand layers more, everything is more nuanced and complex while at the same time, stupid. I feel very old saying that. And yet I grew up in the middle of a digital revolution. I can’t remember a time there wasn’t a computer in my house. I remember being very little and playing with the Paintbrush app on my father’s Macintosh. His cellphone was the size of a brick, and you could hear the sound of the internet over the house phone. Yes, we had landlines. We had a set of CDs containing the Encyclopaedia Britannica instead of Google and Wikipedia. Facts seemed to be a lot easier to identify, and fiction was a thing left for the arts. Nobody was talking about the Pope wearing Moncler and Trump being president would have been unimaginable.

In the era of AI and misinformation, life has never been more confusing. Facts and fiction are blended seamlessly. All information seems extremely urgent and, at the same time, irrelevant. It has made sceptics out of all of us. Hyper aware that at any time, we can be deceived.

But the nature of AI has always been deceptive. Its success has always relied on its capacity to imitate, trick or replicate human language. In Alan Turing’s Computing Machinery and Intelligence, deception is placed at the centre of the test to determine a machine’s capacity to exhibit intelligent behaviour. Turing’s test proposed judging Machines on their capacity to make human subjects believe they are human. So as technology advanced, AI scientists began studying the human’s reaction to the machine in order to improve its performance based on Turin’s work. And even tho deception was never the main objective, creating the illusion of intelligence rather than intelligence itself became the force driving sentient-like technologies like AI. As Simone Natale points out, »While debates have largely focused on the possibility that the pursuit of strong AI would lead to forms of consciousness similar or alternative to that of humans, where we have landed might more accurately be described as the creation of a range of technologies that provide an illusion of intelligence—in other words, the creation not of intelligent beings but of technologies that humans perceive as intelligent«. Turing named this ‘the imitation game’.

As algorithms got better at imitating us and scientists got better at training them, we also became lazier at recognising them. Making it easier for us to fall into the illusion.

In Deceitful Media; Artificial Intelligence and Social Life after the Turing test, Natale states that »At the roots of technology’s association with magic lies, in fact, its opacity. Our wonder at technological innovations often derives from our failure to understand the technical means through which they work, just as our amazement at a magician’s feat depends partly on our inability to understand the trick«. Yet in my experience, knowing does not warrant that we will not fall into the illusion. In fact, most people who enjoy magic tricks are not ignorant of how the tricks are performed, at least in their most superficial way. Magic shows still attract masses of people ready to surrender to fantasy in exchange for entertainment, aware that it is not real magic. Even more, magicians themselves are avid consumers of the trickery of their colleagues. Because deep down, we all want to be believers.

Our interactions with AI are based, as with many technologies and other systems of belief, on the projections we make in the spaces left by the illusion. We project into the machine our desire to see something that confirms our expectations. We deeply want to believe that what we want to see, hear, feel, and experience is really there.

It’s not surprising that in our loneliest or most boring moments, we turn to our machines for companionship, wanting to believe in the promise of closeness, of something that reflects back to us our deepest fears, wildest dreams and general anxieties, all repackaged in a shiny wrapper of entertainment or distraction, and the promise of taking our problems away.

AI will save the world, solve climate change, inequality, work, creativity blocks, mental health!

When Eliza, one of the first chatbots built in 1964, was put to the test against the secretary of its programmer, Joseph Weizenbaum, also known as one of the fathers of modern AI, the secretary famously asked him to leave the room since the conversation between her and the machine had turned too personal, too intimate. You see, Eliza was programmed to emulate a non-directional psychotherapist, and Weizenbaum’s intention was to prove how communication between humans and machines was superficial. Instead, he ended up proving the opposite, sort of. The secretary ended up projecting her desire to be heard onto the machine. This is defined in psychology as when ‘inside’ content is mistaken to be coming from the ‘outside’ or the Other.

She, too, wanted to believe.

In the summer of 2022, I graduated from the Sandberg Institute where I did a temporary master’s program called F for Fact. The program (which was extended for two more years) focused on investigating different ways of knowledge through artistic research. The blurry lines between Facts and Fiction, the way knowledge is produced. What knowledge is and what it is not.

One of the things you need to do to graduate is write a thesis. At the time, I wasn’t looking forward to it. My bachelor’s thesis had left me with some PTSD, and I didn’t want to sound stupid or like I was trying too hard. So I thought it would be a great idea to ask GPT2 (just released on early sign-up access) to write my thesis for me. I had always been fascinated by technology, and I was then in my google earth era and working on a project about the materiality of digital technologies and the Internet, researching transatlantic internet cable networks and lithium mines. So it seemed like a great idea to use this new technology to write my thesis for me.

What started as a simple ‘I am too lazy and insecure, let a machine do it for me’ became an exploration of how these technologies would change the way we create knowledge and whether knowledge could be generated. Could we outsource knowledge creation to machines? Could I ‘cheat’ my way out of the thesis? Long story short, it turns out I couldn’t. Automation was not liberation. I still needed to write it, and probably it would have been easier just to write it myself. But the process became the topic of my thesis and the object of the research itself.

AKA, I ended up writing about co-writing with AI while co-writing with AI.

Looking back, one of the most interesting parts of co-writing was that even though I went into the process thinking, ‘I’m not gonna fall for it’, at times, I ended up forgetting I was talking to algorithms. Turns out I also wanted to believe in the promise of a machine that could help me overcome my anxieties around writing. And it kind of did, just not in the way I was expecting it to.

What happened is that I ended up needing to be extremely precise in what I wanted to write about, or else the algorithms would take me to topics I didn’t want or need to address. Nowadays, this is really clearly exemplified by how prompt engineers are becoming more and more important when working with AI. The capacity to get what you want from the algorithms is directly linked to the quality of the prompt. AKA what you ask is what you get, but not always what you want.

I couldn’t get what I wanted, a quick thesis. But I got what I needed, a bunch of AIs making me realise I was not as bad of a writer as I thought I was.

In the end, the thesis became a collection of texts co-written by me and a number of programs: GPT2, GPT3, Eliza and Replika. On top, a reflective text was written only by me, in which I looked back on the joys and frustrations when trying to co-write with AI, the problematic things in it (biases and all) and the need to engage with them with a critical eye.

I started as a sceptic, stumbled into my own projections and beliefs, and I ended up falling in love with the glitchy parts of my dear machines, which offered digital companionship and collaboration when I most needed it.

It is now a Monday evening, and while I am working on this text, I am thinking about a lecture I recently gave at an AI department of a Dutch university. There I discussed how I work with AI to co-write and collaborate on different projects such as my thesis. One of the scientists asked me whether I was afraid of AI. I answered that I was afraid of what humans could do with it. Another asked if I thought artists would be replaced by AIs and the future of human art was dead. I pointed out how with the invention of the camera, people predicted the end of painting, yet painters still paint. And in time, the camera itself became a tool for artistic production, not only documentation. It didn’t take that long before artists started experimenting with the new medium and adopting it as part of their set of tools. It will be the same with AI. Artists will always make art. And technological advancements will always go through a period of adjustment before they are normalised and demystified. When the phonograph was invented, people feared it was capturing the souls of the people it recorded. When Kindles appeared, people speculated it was the end of books. And in the face of Spotify, there’s a return to the LPs. So perhaps what will happen is that human-made art will be the equivalent of organic vegetables with a ‘made by humans’ certificate on it.

Long time ago, we dreamed and feared the end of work with the rise of machines. We dreamed of having more free time but also of losing our jobs. Instead, we are working more than ever, having less free time and inventing more jobs.

We keep hoping that making more machines will save us from the problems other machines have created. Recently I read an article in the Guardian pointing out that big tech people and politicians constantly mention how through AI, they will be able to make smarter choices and solve big problems for the benefit of humankind. But this is in itself quite stupid. Solving climate change doesn’t need smarter minds to create new solutions. We already know what to do; we just need people ready to put economic interests aside in order to reverse decades of damage.

As I sit on my sofa, again, trying not to fall into another TikTok hole, I realise I’m getting anxious again because I’m thinking about climate change, and that shit is scary. So I ask ChatGPT to make a joke to lighten the mood.

This is what came out: »Why did the AI start worrying about climate change? Because it realised it couldn’t CTRL+Z the planet’s rising temperature.«

It also explained why it should be funny:

»This joke is funny because it combines a humorous play on words with a relevant and serious topic. The punchline revolves around the computer keyboard shortcut ‘CTRL+Z,’ which is commonly used to undo or revert actions in many software programs. In this context, it is used metaphorically to suggest that the AI realised it couldn’t undo or reverse the planet’s rising temperature caused by climate change.

The joke cleverly blends technology and the environment, highlighting the AI’s recognition of a problem that cannot be easily undone or reversed. By juxtaposing the familiar computer shortcut with the pressing issue of climate change, the joke elicits a humorous response while also highlighting the seriousness and urgency of addressing environmental concerns

 I am suddenly very aware of the irony of using a super-polluting technology to talk about climate change. And even more, to make jokes about it. According to Bloomberg, the initial training of GPT3 consumed more energy than what the average household consumes in 3 years. And I’m not even mentioning all the material resources like minerals and water and cables and rumbas cleaning the servers and people making sure the Roombas are emptied and plugged in, and coders working for 11 dollars per hour, and us training them for free. The old, if you are not paying for it, you are the product.

It is hard not to fall for something designed to mirror us and keep us entertained. We all like looking at ourselves, and the algorithms know it. We all want to see more of what we like, especially when the world seems to be ending… once more. The one thing that truly comforts me is knowing how many times humanity has predicted the end of the world throughout history. And yet here we are.

I feel guilty from time to time because I am using technologies that I know are bad for me and the environment. But I also make a point to try to use them critically and put my tiny seed of resistance. Clicking ‘no’ on all cookies popups. Deleting my apps now and then. Using alternative platforms and programs. B     locking all ads. Going out for a walk instead of staring at my phone. Helping an old lady cross the street and carry the groceries. I remember Michelle Young’s words: »Power is relative. No one of us can bring about change by ourselves. But for each of us, our part is vital.«

I’ll try listening more to my mom and stare out of the window for ten minutes now and then.

To not webMD my symptoms. To not buy the next thing TikTok tells me to buy because it won’t solve all my problems.

I’ll also try not to feel so guilty and do more. To acknowledge that our relationship with technology is very intimate and intricate but also problematic. Like a codependent relationship. Maybe we should all go to therapy.

But also, like @ummsimonee said, »If you speak what you want into existence, at the very least, the Instagram algorithm will hear you.«

And my personal favourite: »Nobody knows me like the notes app does.«

This text is an adaptation of a lecture at Spui25 on Co-Creation with AI and organised by the Hmm. 

 

Your Voice is (Not) Your Passport

In summer 2021, sound artist, engineer, musician, and educator Johann Diedrick convened a panel at the intersection of racial bias, listening, and AI technology at Pioneerworks in Brooklyn, NY. Diedrick, 2021 Mozilla Creative Media award recipient and creator of such works as Dark Matters, is currently working on identifying the origins of racial bias in voice interface systems. Dark Matters, according to Squeaky Wheel, “exposes the absence of Black speech in the datasets used to train voice interface systems in consumer artificial intelligence products such as Alexa and Siri. Utilizing 3D modeling, sound, and storytelling, the project challenges our communities to grapple with racism and inequity through speech and the spoken word, and how AI systems underserve Black communities.” And now, he’s working with SO! as guest editor for this series (along with ed-in-chief JS!). It kicked off with Amina Abbas-Nazari’s post, helping us to understand how Speech AI systems operate from a very limiting set of assumptions about the human voice. Last week, Golden Owens took a deep historical dive into the racialized sound of servitude in America and how this impacts Intelligent Virtual Assistants. Today, Michelle Pfeifer explores how some nations are attempting to draw sonic borders, despite the fact that voices are not passports.–JS

In the 1992 Hollywood film Sneakers, depicting a group of hackers led by Robert Redford performing a heist, one of the central security architectures the group needs to get around is a voice verification system. A computer screen asks for verification by voice and Robert Redford uses a “faked” tape recording that says “Hi, my name is Werner Brandes. My voice is my passport. Verify me.” The hack is successful and Redford can pass through the securely locked door to continue the heist. Looking back at the scene today it is a striking early representation of the phenomenon we now call a “deep fake” but also, to get directly at the topic of this post, the utter ubiquity of voice ID for security purposes in this 30-year-old imagined future.

In 2018, The Intercept reported that Amazon filed a patent to analyze and recognize user’s accents to determine their ethnic origin, raising suspicion that this data could be accessed and used by police and immigration enforcement. While Amazon seemed most interested in using voice data for targeting users for discriminatory advertising, the jump to increasing surveillance seemed frighteningly close, especially because people’s affective and emotional states are already being used for the development of voice profiling and voice prints that expand surveillance and discrimination. For example, voice prints of incarcerated people are collected and extracted to build databases of calls that include the voices of people on the other end of the line.


“Collect Calls From Prison” by Flickr User Cobalt123 (CC BY-NC-SA 2.0)

What strikes me most about these vocal identification and recognition technologies is how their appeal seems to lie, for advertisers, surveillers, and policers alike that voice is an attractive method to access someone’s identity. Supposedly there are less possibilities to evade or obfuscate identification when it is performed via the voice. It “is seen as a solution that makes it nearly impossible for people to hide their feelings or evade their identities.” The voice here works as an identification document, as a passport. While passports can be lost or forged, accent supposedly gives access to the identity of a person that is innate, unchanging, and tied to the body. But passports are not only identification documents. They are also media of mobility, globally unequally distributed, that allow or inhibit movement across borders. States want to know who crosses their borders, who enters and leaves their territory, increasingly so in the name of security.

What, then, when the voice becomes a passport? Voice recognition systems used in asylum administration in the Global North show what is at stake when the voice, and more specifically language and dialect, come to stand in for a person’s official national identity. Several states including Denmark, the Netherlands, the United Kingdom, Switzerland, Sweden, as well as Australia and Canada have been experimenting with establishing the voice, or more precisely language and dialect, to take on the passport’s role of identifying and excluding people.

“Passport Brochure” by Craig James (CC BY-NC 2.0)

In the 1990s—not too far from the time of Sneakers release—they started to use a crude form of linguistic analysis, later termed Language Analysis for the Determination of Origin (LADO), as part of the administration of claims to asylum. In cases where people could not provide a form of identity documentation or when those documents would be considered fraudulent or inauthentic, caseworkers would look for this national identity in the languages and dialects of people. LADO analyzes acoustic and phonetic features of recorded speech samples in relation to phonetics, morphology, syntax, and lexicon, as well as intonation and pronunciation.

The problems and assumptions of this linguistic analysis are multiple as pointed out and critiqued by linguists. 1) it falsely ties language to territorial and geopolitical boundaries and assumes that language is intimately tied to a place of origin according to a language ideology that maps linguistic boundaries onto geographical boundaries. Nation-state borders on the African continent and in the Middle East were drawn by colonial powers without considerations of linguistic communities. 2) LADO thinks of language and dialect as static, monoglossic and a stable index of identity. These assumptions produce the idea of a linguistic passport in which language is supposed to function as a form of official state identification that distributes possibilities and impossibilities of movement and mobility. As a result, the voice becomes a passport and it simultaneously functions as a border, by inscribing language into territoriality. As Lawrence Abu Hamdan has written and shown through his sound art work The Freedom of Speech itself, LADO functions to control territory, produce national space, and attempts to establish a correlation between voice and citizenship.

Language Analysis is the Second Step in Claiming Asylum in the UK (Home Office Science: Migration Border Analysis, 2012 p.37), see also K. Wilson’s LADO: An Investigative Study

I’ll add that the very idea of a passport has a history rooted in forms of colonial governance and population control and the modern nation-state and territorial borders. The body is intimately tied to the history of passports and biometrics. For example, German colonial administrators in South-West Africa, present day Namibia, and German overseas colony from 1884 to 1919 instituted a pass batch system to control the mobility of Indigenous people, create an exploitable labor force, and institute and reinforce white supremacy and colonial exploitation. Media and Black Studies scholar Simone Browne describes biometrics as “digital epidermalization,” to describe how surveillance becomes inscribed and encoded on the skin. Now, it’s coming for the voice too.

In 2016 the German government took LADO a step further and started to use what they call a voice biometric software that supposedly identifies the place of origin of people who are seeking asylum. Someone’s spoken dialect is supposedly recognized and verified on the basis of speech recordings with an average lengths of 25,7 seconds by a software employed by the German Ministry for Migration and Refugees (in German abbreviated as BAMF). The now used dialect recognition software used by German asylum administrators distinguishes between 4 large Arabic dialect groups: Levantine, Maghreb, Iraqi, Egyptian, and Gulf dialect. Just recently this was expanded with language models for Farsi, Dari and Pashto. There are plans to expand this software usage to other European countries, evidenced by BAMF traveling to other countries to demonstrate their software.

“voice vectors” Universal (CC0 1.0)

This “branding” of BAMF’s software stands in stark contradiction to its functionality. The software’s error rate is 20 percent. It is based on a speech sample as short as 26 seconds. People are asked to describe pictures while their speech is recorded, the software then indicates a percentage of probability of the spoken dialect and produces a score sheet that could indicate the following: 74% Egyptian, 13% Levantine, 8% Gulf Arabic, 5 % Other. The interpretation of results is left to the caseworkers without clear instructions on how to weigh those percentages against each other. The discretion left to caseworkers makes it more difficult to appeal asylum decisions. According to the Ministry, the results are supposed to give indications and clues about someone’s origin and are not a decision-making tool. However, as I have argued elsewhere, algorithmic or so-called “intelligent” bordering practices assume neutrality and objectivity and thereby conceal forms of discrimination embedded in technologies. In the case of dialect recognition the score sheet’s indicated probabilities produce a seeming objectivity that might sway case-workers in one direction or another. Moreover, the software encodes distinctions between who is deserving of protection and who is not; a feature of asylum and refugee protection regimes critiqued by many working in the field.

The functionality and operations of the software are also intentionally obscured. Research and sound artist Pedro Oliveira addresses the many black-boxed assumptions entering the dialect recognition technology. For instance, in his work Das hätte nicht passieren dürfen he engages with the labor involved in producing sound archives and speech corpora and challenges “ the idea that it might be feasible, for the purposes of biometric assessment, to divorce a sound’s materiality from its constitution as a cultural phenomenon.” Oliveira’s work counters the lack of transparency and accountability of the BAMF software. Information about its functionality is scarce. Freedom of information requests and parliamentary inquiries about the technical and algorithmic properties and training data of the software were denied as the information was classified because “the information can be used to prepare conscious acts of deception in the asylum proceeding and misuse language recognition for manipulation,” the German government argued.  While it is not necessarily deepfakes like the one Brandes produced to forego a security system that the German authorities are worried about, the specter of manipulation of the software looms large. 

The consequences of the software’s poor functionality can have drastic consequences for asylum decisions. Vice reported in 2018 the story of Hajar, whose name was changed to protect his identity. Hajar’s asylum application in Germany was denied on the basis of a dialect recognition software that supposedly indicated that he was a Turkish speaker and, thus, could not be from the Autonomous Region Kurdistan as he claimed. Hajar who speaks the Kurdish dialect Sorani had been instructed by BAMF to speak into a telephone receiver and describe an image in his first language. The software’s results indicated a 63% probability that Hajar speaks Turkish and the caseworker concluded that Hajar had lied in his asylum hearings about his origin and his reasons to seek asylum in Germany who continued to appeal the asylum decision. The software is not equipped to verify Sorani and should not have been used on Hajar in the first place.

Biometric Island, Gdansk University of Technology 2021, Image by Dawid Weber  (CC BY 3.0)

Why the voice? It seems that bureaucrats and caseworkers saw it as a way to identify people with ease and scale language analysis more easily. It is also important to consider the context in which this so-called voice biometry is used. Many people who seek asylum in Germany cannot provide identity documents like passports, birth certificates, or identification cards. This is the case because people cannot take them with them as they flee, they are lost or stolen on people’s journeys, or they are confiscated by traffickers. Many forms of documentation are also not accepted as legitimate by state authorities. Generally, language analysis is used in a hostile political context in which claims to asylum are increasingly treated with suspicion.

The voice as a part of the body was supposed to provide an answer to this administrative problem of states. In response to the long summer of migration in 2015 Germany hired McKinsey to overhaul their administrative processes, save money, accelerate asylum procedures, and make them more “efficient.” In July 2017, the head of the Department for Infrastructure and Information Technology of the German Federal Office for Migration and Refugees hailed the office’s new voice and dialect recognition software as “unrivaled world-wide” in its capacity to determine the region of origin of asylum seekers and to “detect inconsistencies” in narratives about their need for protection. More than identification documents, personal narratives, or other features of the body, the voice, the BAMF expert suggests is the medium that allows for the indisputable verification of migrants’ claims to asylum, ostensibly pinpointing their place of origin.

Voice and dialect recognition technology are established by policy makers and security industries as particularly successful tools to produce authentic evidence about the origin of asylum seekers. Asylum seekers have to sound like being from a region that warrants their claims to asylum: requiring the translation of voices into geographical locations. As a result, automated dialect recognition becomes more valuable than someone’s testimony. In other words, the voice, abstracted into a percentage, becomes the testimony. Here, the software, similarly to other biometric security systems, is framed as more objective, neutral, and efficient way of identifying the country of origin of people as compared to human decision-makers. As the German Migration agency argued in 2017: “The IT supported, automated voice biometric analysis provides an independent, objective and large-scale method for the verification of the indicated origin.”

“Soundwave and Spectrogram of “CIRCLE” by Lena Zipp, University of Zurich (CC BY-NC-ND 2.0)

The use of dialect recognition puts forth an understanding of the voice and language that pinpoints someone’s origin to a certain place, without a doubt and without considering how someone’s movement or history. In this sense, the software inscribes a vision of a sedentary, ahistorical, static, fixed, and abstracted human into its operations. As a result, geographical borders become reinforced and policed as fixed boundaries of territorial sovereignty. This vision of the voice ignores multiple mobilities and (post)colonial histories and reinscribes the borders of nation-states that reproduce racial violence globally. Dialect recognition reproduces precarity for people seeking asylum. As I have shown elsewhere, in the absence of other forms of identification and the presence of generalized suspicion of asylum claims, accent accumulates value while the content of testimony becomes devalued. Asylum applicants are placed in a double bind, simultaneously being incited to speak during asylum procedures and having their testimony scrutinized and placed under general suspicion.

Similar to conventional passports, the linguistic passport also represents a structurally unequal and discriminatory regime that needs to be abolished. The software was framed as providing a technical solution to a political problem that intensifies the violence of borders. We need to shift to pose other questions as well. What do we want to listen to? How could we listen differently? How could we build a world in which nation-states and passports are abolished and the voice is not a passport but can be appreciated in its multiplicity, heteroglossia, and malleability? How do we want to live together on a planet increasingly becoming uninhabitable?

Featured Image: Voice Print Sample–Image from US NIST

Michelle Pfeifer is postdoctoral fellow in Artificial Intelligence, Emerging Technologies, and Social Change at Technische Universität Dresden in the Chair of Digital Cultures and Societal Change. Their research is located at the intersections of (digital) media technology, migration and border studies, and gender and sexuality studies and explores the role of media technology in the production of legal and political knowledge amidst struggles over mobility and movement(s) in postcolonial Europe. Michelle is writing a book titled Data on the Move Voice, Algorithms, and Asylum in Digital Borderlands that analyses how state classifications of race, origin, and population are reformulated through the digital policing of constant global displacement.

tape-reel

REWIND! . . .If you liked this post, you may also dig:

“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens

Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari 

Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso

The Sound of What Becomes Possible: Language Politics and Jesse Chun’s 술래 SULLAE (2020)Casey Mecija

The Sonic Roots of Surveillance Society: Intimacy, Mobility, and Radio–Kathleen Battles

Acousmatic Surveillance and Big Data–Robin James

Your Voice is (Not) Your Passport

In summer 2021, sound artist, engineer, musician, and educator Johann Diedrick convened a panel at the intersection of racial bias, listening, and AI technology at Pioneerworks in Brooklyn, NY. Diedrick, 2021 Mozilla Creative Media award recipient and creator of such works as Dark Matters, is currently working on identifying the origins of racial bias in voice interface systems. Dark Matters, according to Squeaky Wheel, “exposes the absence of Black speech in the datasets used to train voice interface systems in consumer artificial intelligence products such as Alexa and Siri. Utilizing 3D modeling, sound, and storytelling, the project challenges our communities to grapple with racism and inequity through speech and the spoken word, and how AI systems underserve Black communities.” And now, he’s working with SO! as guest editor for this series (along with ed-in-chief JS!). It kicked off with Amina Abbas-Nazari’s post, helping us to understand how Speech AI systems operate from a very limiting set of assumptions about the human voice. Last week, Golden Owens took a deep historical dive into the racialized sound of servitude in America and how this impacts Intelligent Virtual Assistants. Today, Michelle Pfeifer explores how some nations are attempting to draw sonic borders, despite the fact that voices are not passports.–JS

In the 1992 Hollywood film Sneakers, depicting a group of hackers led by Robert Redford performing a heist, one of the central security architectures the group needs to get around is a voice verification system. A computer screen asks for verification by voice and Robert Redford uses a “faked” tape recording that says “Hi, my name is Werner Brandes. My voice is my passport. Verify me.” The hack is successful and Redford can pass through the securely locked door to continue the heist. Looking back at the scene today it is a striking early representation of the phenomenon we now call a “deep fake” but also, to get directly at the topic of this post, the utter ubiquity of voice ID for security purposes in this 30-year-old imagined future.

In 2018, The Intercept reported that Amazon filed a patent to analyze and recognize user’s accents to determine their ethnic origin, raising suspicion that this data could be accessed and used by police and immigration enforcement. While Amazon seemed most interested in using voice data for targeting users for discriminatory advertising, the jump to increasing surveillance seemed frighteningly close, especially because people’s affective and emotional states are already being used for the development of voice profiling and voice prints that expand surveillance and discrimination. For example, voice prints of incarcerated people are collected and extracted to build databases of calls that include the voices of people on the other end of the line.


“Collect Calls From Prison” by Flickr User Cobalt123 (CC BY-NC-SA 2.0)

What strikes me most about these vocal identification and recognition technologies is how their appeal seems to lie, for advertisers, surveillers, and policers alike that voice is an attractive method to access someone’s identity. Supposedly there are less possibilities to evade or obfuscate identification when it is performed via the voice. It “is seen as a solution that makes it nearly impossible for people to hide their feelings or evade their identities.” The voice here works as an identification document, as a passport. While passports can be lost or forged, accent supposedly gives access to the identity of a person that is innate, unchanging, and tied to the body. But passports are not only identification documents. They are also media of mobility, globally unequally distributed, that allow or inhibit movement across borders. States want to know who crosses their borders, who enters and leaves their territory, increasingly so in the name of security.

What, then, when the voice becomes a passport? Voice recognition systems used in asylum administration in the Global North show what is at stake when the voice, and more specifically language and dialect, come to stand in for a person’s official national identity. Several states including Denmark, the Netherlands, the United Kingdom, Switzerland, Sweden, as well as Australia and Canada have been experimenting with establishing the voice, or more precisely language and dialect, to take on the passport’s role of identifying and excluding people.

“Passport Brochure” by Craig James (CC BY-NC 2.0)

In the 1990s—not too far from the time of Sneakers release—they started to use a crude form of linguistic analysis, later termed Language Analysis for the Determination of Origin (LADO), as part of the administration of claims to asylum. In cases where people could not provide a form of identity documentation or when those documents would be considered fraudulent or inauthentic, caseworkers would look for this national identity in the languages and dialects of people. LADO analyzes acoustic and phonetic features of recorded speech samples in relation to phonetics, morphology, syntax, and lexicon, as well as intonation and pronunciation.

The problems and assumptions of this linguistic analysis are multiple as pointed out and critiqued by linguists. 1) it falsely ties language to territorial and geopolitical boundaries and assumes that language is intimately tied to a place of origin according to a language ideology that maps linguistic boundaries onto geographical boundaries. Nation-state borders on the African continent and in the Middle East were drawn by colonial powers without considerations of linguistic communities. 2) LADO thinks of language and dialect as static, monoglossic and a stable index of identity. These assumptions produce the idea of a linguistic passport in which language is supposed to function as a form of official state identification that distributes possibilities and impossibilities of movement and mobility. As a result, the voice becomes a passport and it simultaneously functions as a border, by inscribing language into territoriality. As Lawrence Abu Hamdan has written and shown through his sound art work The Freedom of Speech itself, LADO functions to control territory, produce national space, and attempts to establish a correlation between voice and citizenship.

Language Analysis is the Second Step in Claiming Asylum in the UK (Home Office Science: Migration Border Analysis, 2012 p.37), see also K. Wilson’s LADO: An Investigative Study

I’ll add that the very idea of a passport has a history rooted in forms of colonial governance and population control and the modern nation-state and territorial borders. The body is intimately tied to the history of passports and biometrics. For example, German colonial administrators in South-West Africa, present day Namibia, and German overseas colony from 1884 to 1919 instituted a pass batch system to control the mobility of Indigenous people, create an exploitable labor force, and institute and reinforce white supremacy and colonial exploitation. Media and Black Studies scholar Simone Browne describes biometrics as “digital epidermalization,” to describe how surveillance becomes inscribed and encoded on the skin. Now, it’s coming for the voice too.

In 2016 the German government took LADO a step further and started to use what they call a voice biometric software that supposedly identifies the place of origin of people who are seeking asylum. Someone’s spoken dialect is supposedly recognized and verified on the basis of speech recordings with an average lengths of 25,7 seconds by a software employed by the German Ministry for Migration and Refugees (in German abbreviated as BAMF). The now used dialect recognition software used by German asylum administrators distinguishes between 4 large Arabic dialect groups: Levantine, Maghreb, Iraqi, Egyptian, and Gulf dialect. Just recently this was expanded with language models for Farsi, Dari and Pashto. There are plans to expand this software usage to other European countries, evidenced by BAMF traveling to other countries to demonstrate their software.

“voice vectors” Universal (CC0 1.0)

This “branding” of BAMF’s software stands in stark contradiction to its functionality. The software’s error rate is 20 percent. It is based on a speech sample as short as 26 seconds. People are asked to describe pictures while their speech is recorded, the software then indicates a percentage of probability of the spoken dialect and produces a score sheet that could indicate the following: 74% Egyptian, 13% Levantine, 8% Gulf Arabic, 5 % Other. The interpretation of results is left to the caseworkers without clear instructions on how to weigh those percentages against each other. The discretion left to caseworkers makes it more difficult to appeal asylum decisions. According to the Ministry, the results are supposed to give indications and clues about someone’s origin and are not a decision-making tool. However, as I have argued elsewhere, algorithmic or so-called “intelligent” bordering practices assume neutrality and objectivity and thereby conceal forms of discrimination embedded in technologies. In the case of dialect recognition the score sheet’s indicated probabilities produce a seeming objectivity that might sway case-workers in one direction or another. Moreover, the software encodes distinctions between who is deserving of protection and who is not; a feature of asylum and refugee protection regimes critiqued by many working in the field.

The functionality and operations of the software are also intentionally obscured. Research and sound artist Pedro Oliveira addresses the many black-boxed assumptions entering the dialect recognition technology. For instance, in his work Das hätte nicht passieren dürfen he engages with the labor involved in producing sound archives and speech corpora and challenges “ the idea that it might be feasible, for the purposes of biometric assessment, to divorce a sound’s materiality from its constitution as a cultural phenomenon.” Oliveira’s work counters the lack of transparency and accountability of the BAMF software. Information about its functionality is scarce. Freedom of information requests and parliamentary inquiries about the technical and algorithmic properties and training data of the software were denied as the information was classified because “the information can be used to prepare conscious acts of deception in the asylum proceeding and misuse language recognition for manipulation,” the German government argued.  While it is not necessarily deepfakes like the one Brandes produced to forego a security system that the German authorities are worried about, the specter of manipulation of the software looms large. 

The consequences of the software’s poor functionality can have drastic consequences for asylum decisions. Vice reported in 2018 the story of Hajar, whose name was changed to protect his identity. Hajar’s asylum application in Germany was denied on the basis of a dialect recognition software that supposedly indicated that he was a Turkish speaker and, thus, could not be from the Autonomous Region Kurdistan as he claimed. Hajar who speaks the Kurdish dialect Sorani had been instructed by BAMF to speak into a telephone receiver and describe an image in his first language. The software’s results indicated a 63% probability that Hajar speaks Turkish and the caseworker concluded that Hajar had lied in his asylum hearings about his origin and his reasons to seek asylum in Germany who continued to appeal the asylum decision. The software is not equipped to verify Sorani and should not have been used on Hajar in the first place.

Biometric Island, Gdansk University of Technology 2021, Image by Dawid Weber  (CC BY 3.0)

Why the voice? It seems that bureaucrats and caseworkers saw it as a way to identify people with ease and scale language analysis more easily. It is also important to consider the context in which this so-called voice biometry is used. Many people who seek asylum in Germany cannot provide identity documents like passports, birth certificates, or identification cards. This is the case because people cannot take them with them as they flee, they are lost or stolen on people’s journeys, or they are confiscated by traffickers. Many forms of documentation are also not accepted as legitimate by state authorities. Generally, language analysis is used in a hostile political context in which claims to asylum are increasingly treated with suspicion.

The voice as a part of the body was supposed to provide an answer to this administrative problem of states. In response to the long summer of migration in 2015 Germany hired McKinsey to overhaul their administrative processes, save money, accelerate asylum procedures, and make them more “efficient.” In July 2017, the head of the Department for Infrastructure and Information Technology of the German Federal Office for Migration and Refugees hailed the office’s new voice and dialect recognition software as “unrivaled world-wide” in its capacity to determine the region of origin of asylum seekers and to “detect inconsistencies” in narratives about their need for protection. More than identification documents, personal narratives, or other features of the body, the voice, the BAMF expert suggests is the medium that allows for the indisputable verification of migrants’ claims to asylum, ostensibly pinpointing their place of origin.

Voice and dialect recognition technology are established by policy makers and security industries as particularly successful tools to produce authentic evidence about the origin of asylum seekers. Asylum seekers have to sound like being from a region that warrants their claims to asylum: requiring the translation of voices into geographical locations. As a result, automated dialect recognition becomes more valuable than someone’s testimony. In other words, the voice, abstracted into a percentage, becomes the testimony. Here, the software, similarly to other biometric security systems, is framed as more objective, neutral, and efficient way of identifying the country of origin of people as compared to human decision-makers. As the German Migration agency argued in 2017: “The IT supported, automated voice biometric analysis provides an independent, objective and large-scale method for the verification of the indicated origin.”

“Soundwave and Spectrogram of “CIRCLE” by Lena Zipp, University of Zurich (CC BY-NC-ND 2.0)

The use of dialect recognition puts forth an understanding of the voice and language that pinpoints someone’s origin to a certain place, without a doubt and without considering how someone’s movement or history. In this sense, the software inscribes a vision of a sedentary, ahistorical, static, fixed, and abstracted human into its operations. As a result, geographical borders become reinforced and policed as fixed boundaries of territorial sovereignty. This vision of the voice ignores multiple mobilities and (post)colonial histories and reinscribes the borders of nation-states that reproduce racial violence globally. Dialect recognition reproduces precarity for people seeking asylum. As I have shown elsewhere, in the absence of other forms of identification and the presence of generalized suspicion of asylum claims, accent accumulates value while the content of testimony becomes devalued. Asylum applicants are placed in a double bind, simultaneously being incited to speak during asylum procedures and having their testimony scrutinized and placed under general suspicion.

Similar to conventional passports, the linguistic passport also represents a structurally unequal and discriminatory regime that needs to be abolished. The software was framed as providing a technical solution to a political problem that intensifies the violence of borders. We need to shift to pose other questions as well. What do we want to listen to? How could we listen differently? How could we build a world in which nation-states and passports are abolished and the voice is not a passport but can be appreciated in its multiplicity, heteroglossia, and malleability? How do we want to live together on a planet increasingly becoming uninhabitable?

Featured Image: Voice Print Sample–Image from US NIST

Michelle Pfeifer is postdoctoral fellow in Artificial Intelligence, Emerging Technologies, and Social Change at Technische Universität Dresden in the Chair of Digital Cultures and Societal Change. Their research is located at the intersections of (digital) media technology, migration and border studies, and gender and sexuality studies and explores the role of media technology in the production of legal and political knowledge amidst struggles over mobility and movement(s) in postcolonial Europe. Michelle is writing a book titled Data on the Move Voice, Algorithms, and Asylum in Digital Borderlands that analyses how state classifications of race, origin, and population are reformulated through the digital policing of constant global displacement.

tape-reel

REWIND! . . .If you liked this post, you may also dig:

“Hey Google, Talk Like Issa”: Black Voiced Digital Assistants and the Reshaping of Racial Labor–Golden Owens

Beyond the Every Day: Vocal Potential in AI Mediated Communication –Amina Abbas-Nazari 

Voice as Ecology: Voice Donation, Materiality, Identity–Steph Ceraso

The Sound of What Becomes Possible: Language Politics and Jesse Chun’s 술래 SULLAE (2020)Casey Mecija

The Sonic Roots of Surveillance Society: Intimacy, Mobility, and Radio–Kathleen Battles

Acousmatic Surveillance and Big Data–Robin James