In an effort to create artificial intelligence (AI)-powered translation programs for unwritten, predominantly oral languages, Meta claims it has developed the first such program for Hokkien, a southern Chinese language that does not have a standardized writing system.
Although Hokkien is written to some extent, its roughly 46 million speakers do not have a widely agreed-upon writing system, making it difficult for developers to gather the amount of high-quality written data to train traditional machine translation (MT) models. Meta has attempted to tackle this challenge by devising a “new modeling approach” that could also be used to develop similar models for other languages that are primarily spoken and do not have a standardized script.
“Our team developed the first speech-to-speech AI translation system that works for languages that are only spoken and not written, like Hokkien,” said Meta CEO Mark Zuckerberg in a video where he demonstrated the technology with one of the researchers who worked on it.
Meta has open-sourced its Hokkien translation models, along with its evaluation datasets and additional research so that other developers can produce similar models for other languages that don’t have readily available written data.
“AI-powered speech translation has mainly focused on written languages, yet nearly 3,500 living languages are primarily spoken and don’t have a widely used writing system,” the company wrote in an Oct. 19 blog post. “This makes it impossible to build machine translation tools using standard techniques, which require large amounts of written text in order to train an AI model.”
To train their model, Meta’s researchers used Mandarin Chinese — which is much more closely related to Hokkien than English is — as a sort of intermediate language, translating English and Hokkien speech into written Mandarin, which could then be translated into one of the target languages.
Currently, Meta’s speech-to-speech translator only allows users to translate one sentence at a time, however the company stated that its progress in other unsupervised learning projects could help refine speech-to-speech translation such that no human annotation is necessary for training.
The need for technology like this appears to be higher than ever. MultiLingual Magazine reported in July that one Nevada county was recently tasked with improving language access for another unwritten language: Shoshone. Because election ballots — written texts — cannot be translated adequately into Shoshone, the county will work with interpreters at the polls to help Shoshone speakers vote in their native language.