Researchers at the international tech company NVIDIA recently developed an award-winning automatic speech recognition (ASR) model for Telugu.
Though it’s among the most widely spoken languages in the world, Telugu has few high-quality ASR models available for it, and is classified as a low-resource language. NVIDIA researchers came in first place at an October ASR competition held by the Indian Institutes of Information Technology-Hyderabad (IIIT-Hyderabad).
“ASR is gaining a lot of momentum in India majorly because it will allow digital platforms to onboard and engage with billions of citizens through voice-assistance services,” said Megh Makwana, one of the NVIDIA researchers who participated in the competition, in a blog post the company shared on Dec. 2.
While ASR isn’t perfect, the technology is, all in all, pretty accurate — for English, at least. When it comes to low-resource languages, though, this is far from the case, as large datasets of speech in these languages can be quite hard to come by. Take Telugu, for instance: It’s among the 20 most widely spoken languages in the world, with more than 80 million native speakers. Yet, speech recognition tools fail to do the language justice, as the speech datasets available are smaller than those that are available for other languages.
The competition consisted of two tracks — an open track wherein participants could use any dataset and a pretrained model and a closed track in which the competition’s organizers supplied a dataset of roughly 2,000 hours of Telugu speech. On both fronts, NVIDIA’s team came in first place, achieving word error rates (WER) of 12% and 13% respectively.
These results are on-par with state-of-the-art ASR models. According to data from Statista, Rev.ai’s ASR model has a WER of 14%, and Google’s comes in at around 16%. NVIDIA’s research team believes that their work can serve as a “baseline model” that others can use to create more accessible ASR for Telugu.