Linguistic gender bias goes further than you may expect

Researchers at New York University have found that the English word “people” may not be as gender-neutral as you might think.

Of course, the word’s dictionary definition does not denote a specific gender identity — however, using artificial intelligence (AI) to analyze a corpus of 630 billion words paints a different picture of how the word is actually used by English speakers. In a paper published in Science Advances, the team of researchers found that speakers tend to use the word “people” in contexts more frequently associated with men than with women. The study could potentially shed further light on the gender bias in machine translation (MT) and the corpora used to develop MT models, which MultiLingual reported on last year.

“Many forms of bias, such as the tendency to associate ‘science’ with men more than women, have been studied in the past, but there has been much less work on how we view a ‘person,’ ” said April Bailey, the lead author of the paper.

In recent years, users of Google Translate and other MT systems have criticized MT for its gender bias in certain translations — in one example, a user noted that the gender-neutral third-person pronoun in Finnish was translated to “he” in the sentence “He takes care of things” but was translated to “she” in the sentence “She is taking care of the child.” This is essentially the result of the AI replicating gender biases that already exist in human language. Still, the recent study suggests that such gender bias in human language goes even deeper than this.

“The ‘people = men’ bias in word embeddings likely spills over into the wide range of downstream artificial intelligence applications that use word embeddings, including machine translation,” the study reads.

Example of Google Translate’s efforts to reduce gender bias in translation.

In response to criticism, tech companies like Google have attempted to mitigate some of the issues associated with this gender bias. In the case of Google Translate, the system now outputs multiple translations for sentences and words that could have multiple different gendered interpretations (as in the case of the Finnish-to-English examples mentioned earlier). The researchers in the current study note that these efforts have only focused on explicitly gendered language, not taking into account the underlying gender bias of seemingly gender-neutral words like “people.”

“Ongoing efforts to “debias” word embeddings to prevent them from replicating such biases have yielded mixed results and have yet to consider the fundamental ‘people = men’ bias we uncover here,” the researchers write. “We hope that the present work guides future efforts to debias natural language processing algorithms.”

Andrew Warner
Andrew Warner is a writer from Sacramento. He received his B.A. in linguistics and English from UCLA and is currently working toward an M.A. in applied linguistics at Columbia University. His writing has been published in Language Magazine, Sactown Magazine, and The Takeout.


Weekly Digest

Subscribe to stay updated

MultiLingual Media LLC