Developing machine translation to help Indigenous refugees navigate immigration courts

Seeking asylum in the United States can be complex even for speakers of high-resource languages like Spanish — for speakers of less widely spoken languages it can be even more difficult.

This is a problem many refugees from Central America and Mexico face as they seek asylum in the United States and attempt to navigate the country’s labyrinthine immigration system. While most of these individuals speak Spanish, many from more remote or rural areas only speak Indigenous languages for which US courts don’t have many interpreters on hand. 

To help ease the linguistic challenges for these refugees, a team of researchers at the University of Southern California (USC) is working on a project to develop machine translation (MT) for speakers of K’iche’, an Indigenous language that’s in high demand within the US immigration courts. Currently, the researchers are working on a text-to-text system that translates between English and K’iche’ to help individuals seeking asylum communicate more effectively with lawyers and the judges on the case, but they plan to develop a speech-to-speech system over the coming years.

“People are being directly adversely impacted because there aren’t interpreters available for their languages in legal aid organizations,” Katy Felkner, a Ph.D. student at the USC working on the project said in a media statement.

K’iche’ is just one of the Indigenous American languages in high demand. Despite the fact that languages like K’iche’, Q’anjob’al, and Mam — which are Indigenous to Guatemala and Mexico — are each spoken natively by fewer than 2 million people, they are among the 25 most commonly spoken languages in US immigration courts. A 2019 report in the New York Times found that the language barrier can be extremely severe for non-Spanish speaking Central Americans and Mexicans seeking refuge from violence in their home countries. 

Despite the relatively high demand for interpreters working in these languages, some states have to depend on interpreters from out of state to work on immigration cases. Oftentimes, interpreters working in these languages interpret into Spanish and work with another interpreter to convey the meaning from Spanish to English (and vice-versa).

And while there aren’t enough interpreters to easily meet the demand, there also aren’t any easily accessible MT systems for these languages. Because there’s not a lot of parallel written data, it’s difficult to develop high-quality MT for these languages — the researchers say they’re working with less than 30,000 sentences of data for their K’iche’ project. While the K’iche’-English MT engine that the researchers are working on won’t be an end-all-be-all solution to the low number of interpreters working in these languages in the United States, it does have potential to help mitigate some of the obstacles refugees may face. 

“This is a concrete and immediate way that we can use natural language processing for social good,” Felkner said.

Andrew Warner
Andrew Warner is a writer from Sacramento. He received his B.A. in linguistics and English from UCLA and is currently working toward an M.A. in applied linguistics at Columbia University. His writing has been published in Language Magazine, Sactown Magazine, and The Takeout.


Weekly Digest

Subscribe to stay updated

MultiLingual Media LLC