Linguistic prejudice, race and machine translation

January 3, 2018

Linguistic prejudice, race and machine translation

There are two basic approaches to grammar: the kind that says “this is what the rule book has said since 1858” and the kind that says “language evolves, and this is how it’s actually being used in the current world to communicate these specific concepts and grammatical differences.” The way pockets of minority speakers use language has always fascinated me, although when I was young it would make me cringe. As a teenager, I thought it was extremely strange, for example, that the black-cap Mennonite community that I sometimes mingled with used word constructions I’d never heard of in real life; they greeted me with “welcome here” instead of “hi;” their pronunciation of “school” sounded more like “skewel.” It sounded super-archaic to me, like in eschewing modern forms of dress they’d also decided — subconsciously or consciously — to eschew modern linguistic constructions.

During grad school, one of my linguistics professors delved into the linguistic nuances of African-American Vernacular English (AAVE). He told us that there was a grammatical dialectical difference between “this coffee cold” and “this coffee be cold” and the difference did not exist in standard American English. “This coffee cold” was a remark about a temporal state; “this coffee be cold” was a remark about a known, habitual quality of this specific genre of coffee. Similar to “le café a été froid” and “le café était froid,” I imagine, or perhaps more accurately, “this coffee is cold” and “this coffee is usually cold.”

AAVE drops certain sounds (but not others) in spoken language; there’s a regularity to the practice. Because there are rules, this is no more “incorrect” in English than when it happens in French or in certain dialects of Spanish; Cuban Spanish, for example, may also drop sounds with a practiced regularity. AAVE drops “to be” verbs in some instances, but then, so does standard Hebrew and Russian. Standard English drops the verb in phrases such as “every man an island unto himself.” White dialectical English drops it in phrases such as “this floor needs swept.”

In short, contrary to the opinions of grumpy white grammar nazis, AAVE isn’t “wrong,” it just does its own thing, having adapted the way language always adapts. And this is important because for a certain portion of the population, these grammatical differences become a reason to mistrust African-Americans, to dismiss them as “uneducated” or “lazy” because they sound different. To treat them with less innate respect.

[bctt tweet=”For a certain portion of the population, these grammatical differences become a reason to mistrust African-Americans, to dismiss them as “uneducated” or “lazy” because they sound different.” via=”no”]

A study put out in June of this year, for example, concluded that police use less respectful language with black members of the community than white members of the community, even controlling for heavy-crime areas and reasons for the police stop. The study could find no difference, in fact, than the race of the people being spoken to by police.

Now, I find it hard to believe that the majority of police are linguistically reacting purely to the skin color of the person in front of them. What seems more likely to me, as a linguist, is that they react linguistically to linguistic difference (real or perceived). When a person speaks a non-standard dialect, or is assumed to speak a non-standard dialect, that person is usually placed in a more suspect category. If their speech itself is not “correct,” what else is not correct about them?

I consider myself open-minded on such matters, but I am by no means immune to this. I noticed as I was recently watching an interview with Seattle Seahawks-turned-Oakland Raiders player Marshawn Lynch that I couldn’t stop the subconscious commentary in the back of my head on his pronunciation of ask as “axe,” or the myriad of ways he sounded like a stereotypical black man. His way of speaking sounded incorrect to my brain; the unintentional emotional result ranged between slight irritation and amusement. Neither are particularly respectful reactions. The guy standing next to me, on the other hand, remarked “I love that he’s himself, and he isn’t dumbing himself down for the media. He sounds so black. He’s such a badass.”

This guy had grown up siding with his black friends against stereotypical white jock bullies and Klansmen in the south, so his firsthand experience with African-American dialects was way more intimate than mine. More friendly, more familiar. His subconscious was trained differently than mine.

And I thought, you know, he’s totally right. It’s pretty badass that this guy is refusing to change who he is, refusing to give up his linguistic heritage, in the pursuit of fame or being more palatable to the money machine of corporate America.

I posit that, given my own reaction, white Americans are less likely to believe a man committed a crime if he sounds like them; if he speaks with the cadence and vocabulary of a white man. This is, of course, a difficult theory to prove in a double-blind study, but it bears out anecdotally. As this study shows, it is true that many people are implicitly biased against accents unlike their own and certain accents in particular, whether or not they realize it. It is also true, for example, that all-white juries are 16% more likely to convict a black defendant than a white defendant.

It seems likely that linguistics play a role, and they certainly have on a trial-by-trial basis. After Trayvon Martin was fatally shot in Sanford, Florida, by George Zimmerman, Martin’s friend Rachel Jeantel, who had been present, testified against Zimmerman. Jeantel spoke non-standard English. Her speech patterns were widely mocked on social media, while her testimony was ignored by jurors. A prize-winning linguistics write-up put it this way: “one of the six jurors (B37) said, in a TV interview with CNN’s Anderson Cooper after the trial (July 15, 2013), that she found Jeantel both ‘hard to understand’ and ‘not credible’. In the end, despite her centrality to the case, ‘no one mentioned Jeantel in [16+ hour] jury deliberations. Her testimony played no role whatsoever in their decision’ (Juror Maddy, as reported in Bloom 2014:148). In a sense, “Jeantel’s dialect was found guilty as a prelude to and contributing element in Zimmerman’s acquittal.”

Accent and dialect influence how you’re perceived. I once conducted my own experiment on accent: during my first semester of grad school, I was employed taking phone surveys about Charmin Ultra toilet paper. This was extremely boring, so I ran my own secondary experiment in the background: I would alternate calls in an Irish accent, in a standard American accent, and in a Southern accent. I was curious if accent played any role in people’s willingness to take a survey about toilet paper; the majority of people hung up on any accent, but maybe there was a competitive edge I could use to complete more surveys, and thus to earn more money per hour.

I was calling non-Southern white Americans, by the sound of it; the call’s geography was random numbers pulled from somewhere like northern Arizona or Wyoming. I kept track of completed surveys in each accent. After doing this enough times, a pattern started to emerge: people slightly preferred talking to a woman who spoke in a soft-and-subtle Irish accent, followed by standard, crisp American English; Southern American English was a distant last. Few people seemed to take Southern Accent Girl seriously enough to complete a toilet paper survey with her voice on the other end of the line.

Southern accents are often associated with being “uneducated” or “dumb,” even to listeners as young as five years old, so this was not a huge surprise. And what American, on the other hand, doesn’t love the Irish?

And lest this be considered an American phenomenon, British studies have found that speaking with a Birmingham accent (like Ozzy Osborne) makes listeners assume that you are less intelligent compared to standard British English or a Yorkshire accent.

Humans make judgments about dialect and language, often without realizing it. However, machines only do this where their data is prejudiced in some way. Data-driven linguistic models collect data removed of innate prejudice, studying how humans use language and deriving rules from this. Although this has certainly resulted in subtle and non-subtle human prejudice being codified into machine learning, it also presents an opportunity to create programs that may correct for human prejudices. Data-driven models often work best when the linguistic field is narrowed, actually, because human language is so broad. Because of this, I wonder if there will be — could be — machine translation settings in the future that take into account the dialect of English being spoken; certainly this is a question that speech-to-text MT applications have to take into account. And I wonder if this, somehow, could be used to “translate” dialects in places like the courtroom, for the benefit of everyone.

Linguistic prejudice, race and machine translation

RELATED ARTICLES

The Linguistic Service Sector Between Rhetoric and Reality

Lady Gaga accepts role of CTO for top MT provider

delsur. Achieves Dual ISO Certifications for Translation and Post-Editing Services

One Year In, Kinyarwanda Google Translate Still Flawed

A New Translation App from…Hyundai?

Weekly Newsletter, Subscribe to stay updated!

Login or Register