Earlier this week, Google announced that its machine translation (MT) platform, Google Translate, will now offer 24 more languages.
Many of the newly added languages are underserved languages that are not widely available on other mainstream MT programs — for example, of the 24 languages added to Google Translate this week, only three are available on Microsoft Translator. The addition brings the number of languages offered on Google Translate to 133.
In addition to making MT more accessible for these languages, Google also noted that it used a new technological approach to adding them to the platform. These are the first languages to be developed using zero-shot MT, which allows a model to be trained using only monolingual text. That is, to prepare these languages for Google Translate, the company did not need any translated data — only text written in the original language. Similar technology has been used to improve accessibility for languages that are considered to be under-resourced.
“While this technology is impressive, it isn’t perfect,” Google’s Isaac Caswell wrote in a May 11 blog post announcing the addition. “And we’ll keep improving these models to deliver the same experience you’re used to with a Spanish or German translation, for example.”
Altogether, the languages added to Google Translate’s lineup are spoken by more than 300 million people. Many of the languages added to the list are considered minority languages in the countries where they are spoken. Several languages indigenous to Southeast Asia and Africa were added to the platform. Here’s the complete list of the 24 languages that have been added to the platform, organized by continent:
The African languages added to Google Translate are spoken widely throughout the continent — they represent a broad group of language families, however the Niger-Congo family is the most prominent on the list. Additionally, Sierra Leonean Krio, an English-based Creole language, was included on the list.
- Sierra Leonean Krio
Most of the Asian languages added to Google Translate are concentrated within India and neighboring countries. Additionally, the company added Sanskrit to the platform, the holy language of Hinduism. Outside of the Indian subcontinent, languages like Kurdish and Ilocano will now be available on Google Translate.
- Kurdish (Sorani)
While Spanish and Portuguese may be the most well-known and widely used languages in South America, they’re far from the only ones — many languages indigenous to the continent are used by a large population of speakers. The three American languages added to Google Translate represent some of the most widely spoken indigenous languages native to the continent. They’re also the first languages indigenous to either North or South America to be added to the platform.