Google Releases Dataset to Address Gender Bias

July 15, 2021

In an effort to address the gender bias in its neural machine translation (NMT) technologies, Google has recently released a new dataset that appears to be able to improve the rate at which Google Translate accurately translates gendered language.

“One research area has been using context from surrounding sentences or passages to improve gender accuracy,” reads a recent blog post from the company’s AI team. “This is a challenge because traditional NMT methods translate sentences individually, but gendered information is not always explicitly stated in each individual sentence.”

In late June, four researchers at Google published a dataset called the Wikipedia Translated Biographies dataset, which includes a collection of Wikipedia entries on a person (identified as male or female), rock band, or sports team (the latter two are considered to be genderless). According to Google, the new dataset appears to be able to significantly improve gender access, though there’s still work to be done.

“It’s worth mentioning that by releasing this dataset, we don’t aim to be prescriptive in determining what’s the optimal approach to address gender bias,” the team writes. “This contribution aims to foster progress on this challenge across the global research community.”

In the blog post, Google gives an example of a Spanish paragraph whose subject is female, however the subject is not explicitly mentioned in every sentence of the paragraph, due to the fact that Spanish is a pro-drop language and does not always include subjects in every sentence — thus, the translation engine could potentially mistranslate such sentences into English using masculine pronouns, rather than the correct, feminine ones (or vice versa). When the Wikipedia Translated Biographies dataset was used, Google Translate was able to more frequently produce translations using the accurate gender pronouns.

Back in April, MultiLingual reported on issues with the gender bias in Google’s translation engine, after an onslaught of social media users noticed issues with how Google Translate translated non-gendered language into gendered languages. Oftentimes, such translations reflected stereotypical depictions of gender roles — i.e., translating a non-gendered pronoun from Finnish into English as “he” when associated with the word “doctor” but translating the same pronoun as “she” when associated with the word “teacher.”

Google Releases Dataset to Address Gender Bias

RELATED ARTICLES

Google Meet Unveils Real-Time Speech Translation—A Leap Toward Seamless Global Communication

Google Translate is Finished. Again.

Speechmatics raises $62 million in Series B for inclusive speech recognition

Google at Interspeech 2022

The Week in Review: September 25, 2023

Weekly Newsletter, Subscribe to stay updated!

Login or Register