Addressing the Gender Bias in Machine Translation

Google Translate has once again sparked a bit of controversy with users noticing a sexist gender bias in its translations. After a group of users took to social media to discuss the the system’s biases — even making it to the front page of Reddit — two researchers at the University of Cambridge wrote about their proposed solution to fixing the biases in an article for The Conversation.

Translating from a largely non-gendered language like English to a gendered one — Spanish, for example — requires machine translation (MT) systems to figure out which gender to assign to words in the target language, and the result often reflects societal perceptions and stereotypes. When languages like Finnish or Hungarian — which don’t have gendered pronouns (meaning these languages don’t make a distinction between “he” and “she,” for example) — enter the mix, it can lead to even more obvious gender biases, as one Finnish researcher pointed out on Twitter.


Such biases in these translations aren’t necessarily the fault of the creators, who are often aware of — and perhaps even disconcerted with — them. Back in 2018, Google published a blog post addressing this very issue, and attempted to resolve things by offering up dual translations of these sentences. When sentences like the above example are translated one-by-one, Google Translate provides users with two translations, one using “he” and the other, “she.” However, when more than one sentence occurs in a given input, the final product does not include the gender-inclusive dual translation.

These issues are more or less reflective of wider societal issues with stereotyping and insensitivity. When machine translation (MT) systems are trained, they receive massive amounts of data from a given language pair, utilizing existing data to learn the language’s syntax, vocabulary, and how words have been translated most often by humans previously. The system will then make statistically based predictions on what structures and words occur most frequently in translations, and as a result, there is room for human biases — or simple frequency anomalies — to make their way into the algorithm. If a human has had to translate “he is a doctor” 100 times, but has only translated “she is a doctor” 50 times, and then the data is fed into the training algorithm, the algorithm will simply pick the option with the highest number of previous translations.

In their article for The Conversation, Stefanie Ullman and Danielle Saunders discussed their solution for retraining MT systems to reduce bias: once a bias is noticed, Ullman and Saunders argued that it’s possible to use smaller data sets as targeted lessons to balance things out and produce more accurate translations — “a bit like an afternoon of gender-sensitivity training at work,” the researchers wrote.

Andrew Warner
Andrew Warner is a writer from Sacramento. He received his B.A. in linguistics and English from UCLA and is currently working toward an M.A. in applied linguistics at Columbia University. His writing has been published in Language Magazine, Sactown Magazine, and The Takeout.

Weekly Digest

Related Articles