Perspectives

AI and Indigenous Language Access

Magic cure or arrogance?

By Jace Norton

he main topic of conversation at nearly every language industry event these days is artificial intelligence (AI). Understandably so — it is a hot topic, especially for those working in translation who are seeing fewer work opportunities as clients increasingly turn to AI for their localization needs. Strangely, the topic of AI, the uncertainty and excitement, the buzz and bluster, are all relatively irrelevant to me as an Indigenous language interpreter and owner of a company focused primarily on Indigenous language access.

AI has almost no bearing whatsoever on the amount of work that I and my company perform, and it currently has little to no immediate practical value to add to my company’s ability to perform most of the work we do. Unlike many language companies, Maya Bridge is not particularly active in trying to implement AI into our workflows. Part of the difference is that, in an industry predominantly driven by written translation and localization, we focus primarily on interpretation, which is under somewhat less threat than translation. Another reason is that AI has almost no presence and no practical use cases for most of the languages with which we work.

Before I continue, let me first disclose a couple of things. First, I fully admit that I am an AI skeptic and cynic. Not once did I use any AI program to help me write this article. Frankly, I tend to turn my nose up at AI-generated content, as I can typically detect it. I almost never use AI in my personal or professional life, and I believe that, in many ways, if not used with caution, AI has the ability to diminish actual intelligence. Call me old-fashioned, but I’m much more of a believer in organic intelligence.

Second, let me be clear in stating that, in this article, I am discussing language access specifically, that is, providing interpretation and translation services for critical or essential needs, for instance, in hospitals and courts. I am not implying that AI has no practical use in language preservation, which can be defined as efforts to digitize, analyze, and otherwise preserve languages. That being said, let’s begin.

Recently, I attended a conference where an individual debating the practical uses of AI in interpretation and translation argued that, someday soon, AI will solve Indigenous language access issues. He argued that rural hospitals encountering speakers of lower-diffusion languages would be able to utilize AI-based interpretation models to handle the needs of those demographics. Many people seem to be under the impression that AI will be a kind of magic cure for all issues relating to Indigenous and lower-diffusion language access. The truth is that it will not. This is because of a variety of factors, not least of which is that the use of AI for language access in the United States without human intervention is not legal for federally funded institutions and organizations.

But let’s pretend, for the sake of argument, that the law did not prohibit use of AI interpretation and translation for language access and that we were allowed to use these tools in hospitals and court rooms. In this scenario, even if AI is developed enough to do a decent job interpreting and translating for other more mainstream languages, Indigenous languages will almost certainly be left off an AI interpreter’s list of languages. The main issue preventing AI from acting as an interpreter for Indigenous languages is that Indigenous and other low-resource languages lack the massive amount of data that AI needs to function in a way that could practically be utilized for interpretation and translation, and very few entities are putting in the human-based work necessary to synthetically create such data.

The truth is, there is a major difference between the demand for language access and commercial language demand. Commercial language demand is what drives companies to market their products, materials, and services in new areas; localize movies and entertainment; and offer multilingual customer support. Commercial language demand indicates where more profits could be made if other linguistic markets were targeted. Commercial language demand pays for itself and is for nonessential services and goods (think of companies that sell products, provide entertainment, etc.).

Language access refers to ensuring individuals have equitable access to essential services, such as in courts, hospitals, and schools, or in other federally funded institutions and programs, through meaningful language services. Although language access is protected by law, its growth is not necessarily incentivized by commercial interests. In some limited cases (typically in healthcare), this is not entirely true: A hospital that provides meaningful language access for its demographics, for example, will presumably attract more patients to its facilities, but certainly not to the same extent as commercial-sector entities. And realistically, commercial entities are not incentivized to go after the Indigenous language markets because they are, from a profitability perspective, insignificant. Case in point — you can watch most of your favorite Netflix shows in Spanish but not in Q’eqchi’ or Chuukese.

In addition, local governments are typically not all that interested in translating documents, producing news, and so on, into Indigenous languages, in large part because of a lack of Indigenous representation in those governments. Many people, especially those in decision-making positions, take the mistaken viewpoint that individuals who speak Indigenous languages are fluent in more dominant languages or, if they aren’t, that they eventually will be. Unfortunately, they aren’t completely wrong — each generation, fewer and fewer individuals are passing on their Indigenous languages. In any case, there is a “demand” issue that correlates with the lack of data and the lack of drive to generate such data.

Thus, you have an almost complete void of data when it comes to most Indigenous languages, especially those with smaller communities of speakers, because relatively few people are generating any kind of content (written or oral) that an AI model could access. Essentially, you need an almost entirely altruistic approach to really be incentivized to work on producing the kind of data needed to fuel an AI model. Typically, the only entities very active in translating and publishing materials into many Indigenous languages such as Q’eqchi’ are religious organizations, like the Church of Jesus Christ of Latter-Day Saints (LDS). While significant, even these efforts are a drop in the bucket compared with the amount of data needed to create language models for AI that would be useful for language access. Although emerging small language models (SLMs) are creating working models for translation in lower-diffusion languages, these models are extremely limited in function.

Take, for example, Google’s models for some Indigenous languages. While it is notable that Google has been able to create arguably decent SLMs for its Google Translate function in languages like Q’eqchi’ and Quechua (almost certainly from scraping public data from websites like the LDS’s), the translations are extremely literal in languages that are particularly idiomatic and metaphoric. The data that would be needed to refine and improve the models simply isn’t there, and without massive undertakings by humans, that data refinement won’t magically happen on its own. And, aside from linguist nerds like myself wanting to look up how to say different things in a language like Quechua, there is essentially no practical use for those models in language access.

It seems that many people have this mental image of courts or hospitals using something like Google Translate to handle the immediate needs of a patient or a respondent, where individuals pass a phone back and forth, typing sentences one at a time. This mindset betrays a type of linguistic ignorance of which we speakers of dominant language are often guilty.

Sociopolitical issues that almost universally plague minority Indigenous communities — often because of colonial entities’ intentional efforts, both historical and ongoing — such as extreme poverty, lack of educational resources, political and economic marginalization, discrimination, and exploitation, result in Indigenous communities typically having extremely low literacy rates in their native languages. So, even if we did get a perfectly working AI model that could translate written materials with 100% accuracy that would also be able to convey meaning based on cultural context with 100% accuracy (which is currently not happening, even for dominant languages like Spanish), without both speech-to-text and text-to-speech functionality, it would still be almost useless for the people who would need it. Indigenous people who need access to language services because of limited proficiency in English, or in another more dominant language like Spanish or French, will almost universally not be able to read or write in their native language.

Incidentally, this same kind of linguistic ignorance occurs frequently with organizations that send out requests for written translation of documents into Indigenous languages and language companies that then outsource those requests to companies like Maya Bridge, blissfully unaware of how useless that written translation, on its own, would be. In reality, for most document translation requests we get, producing a written translation into an Indigenous language will very nearly yield exactly the same result as just leaving the source material in English or Spanish for the populations who need the information. For that reason, at Maya Bridge, we don’t offer translation solutions that don’t also include an audio accompaniment because we know that for the target audience to truly access it, they will need to hear it.

You may be asking, but couldn’t we just develop an AI model that could produce text-to-speech and vice versa in Indigenous languages? Sure, it’s definitely doable — if someone is willing to pay for it, if there is a competent and trained organization to lead the effort, if this organization can get enough data, and if it can then refine and improve upon the models. That’s a lot of ifs.

While SLMs for written translation theoretically can be relatively easily created with source text like the Bible, which is by far the most translated book in the world, the data essential for these models to have any real value to Indigenous communities simply does not exist. A model that could produce text-to-speech and speech-to-text in Indigenous languages would require a massive amount of data that doesn’t currently exist, and no one is all that interested in creating the “synthetic” data that would be needed because, again, there is no “demand.” I’m no data scientist, but if that capability still doesn’t even really exist for incredibly prominent languages like Spanish, which have infinitely more data compared with Indigenous languages, then I’m skeptical that speech-to-text and text-to-speech models for Indigenous languages will be coming anytime soon.

Again, the reality is that AI multilingualism is mainly driven by commercial opportunity — in other words, money. And if there’s no money, there will be much more limited organic growth of AI into Indigenous languages. This may differ for larger Indigenous groups, like those in Africa, who have more robust numbers, but for many Indigenous languages, unless some billionaire lingo-philanthropist emerges and altruistically invests in Indigenous language development for AI models, it’s not only unlikely — in many cases, it would be nearly impossible for those models to emerge. And we haven’t even mentioned the fact that oftentimes Indigenous languages have anywhere from 2 to 50 regional variants that are mutually unintelligible.

In short, as it currently stands, we are a long, long way off from AI having any direct impact on Indigenous and low-resource language access. AI may, however, certainly be utilized as a support to augment human-based efforts working on language conservation, teaching, or other efforts. But if you are thinking that AI will make it so that you don’t have to find a human interpreter for an Indigenous language, you will be waiting a long, long, long time.

Jace Norton is a Q’eqchi’ (Kekchi) interpreter, polyglot, and the CEO/founder of Maya Bridge Language Services, a unique, mission-driven agency focused on increasing language access for lower-diffusion and nondominant languages like the Mayan languages of Guatemala.

BACK TO ISSUE

Eriksen Translations wins awards from Women’s Business Enterprise Council Metro NY

By Andrew Warner

The WBEC Metro NY awarded Eriksen Translations with the awards, honoring the LSP’s work to help city and state agencies in the New York City…

→ Continue Reading

South Dakota’s New Interpreter Law Reshapes State Proceedings for Lakota Speakers

By MultiLingual Staff

As state agencies prepare for the law’s July implementation, the uranium case has become an early test of how South Dakota will meet its new…

→ Continue Reading

Bill to Improve Language Access in Mortgage Servicing

By Andrew Warner

Last week, the United States House Committee on Financial Services passed a bill that would improve language access services during the mortgage lending process. If…

→ Continue Reading

News
Localization
M&A
Business
Culture
Perspectives
Interpreting
Press Releases
Sponsored
Technology

Weekly Digest
Subscribe
Submit News

General Information info@multilingual.com

Subscription subscriptions@multilingual.com

Advertising
advertising@multilingual.com

Editorial Questions editor@multilingual.com

Privacy Policy

General Information
info@multilingual.com

Subscription
subscriptions@multilingual.com

Advertising
advertising@multilingual.com

Editorial Questions
editor@multilingual.com

Privacy Policy