NVIDIA Launches Open Dataset for Multilingual Speech AI

August 18, 2025

NVIDIA has released an open-source multilingual speech AI dataset called Granary, which features nearly one million hours of audio, alongside new AI models optimized for transcription and translation across 25 European languages. The launch, announced August 15, 2025, aims to bridge the gap in speech technologies for underrepresented languages. The research team will present the Granary paper at Interspeech 2025 in the Netherlands this week.

Of the world’s 7,000 languages, only a fraction are supported by AI systems. NVIDIA’s Granary dataset strives to address this imbalance by including approximately 650,000 hours of speech recognition data and over 350,000 hours of speech translation data. It covers widely spoken European languages as well as lesser-resourced ones, such as Croatian, Estonian, and Maltese.

The dataset was developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler, using the NVIDIA NeMo Speech Data Processor to transform vast amounts of unlabeled audio into clean, structured data. The company claims that this process significantly reduces reliance on costly human annotation, making large-scale AI training more efficient and inclusive. According to NVIDIA’s official announcement, Granary achieves target accuracy for automatic speech recognition and translation with about half the training data required by other popular corpora.

Canary and Parakeet: AI Models Built on Granary

Two new models showcase Granary’s potential:

NVIDIA Canary-1b-v2: A billion-parameter model that NVIDIA says tops Hugging Face’s leaderboard for multilingual speech recognition and runs inference up to 10 times faster than comparable large models. It supports transcription and translation between English and two dozen languages.
NVIDIA Parakeet-tdt-0.6b-v3: A high-throughput model designed for real-time or large-scale transcription. NVIDIA states that it is capable of transcribing 24-minute audio clips in one pass, and it automatically detects language input and delivers results with low latency.

Both models produce outputs with punctuation, capitalization, and timestamps, making them ready for production environments such as customer service bots, multilingual chat tools, and near-real-time interpretation systems.

Why It Matters for Localization

For the localization industry, Granary and its companion models signal progress toward more inclusive, scalable speech technologies. By providing open access to multilingual resources, NVIDIA is equipping developers to expand AI-powered services across Europe’s diverse linguistic landscape. The implications are significant: improved access to underrepresented languages, faster model development through NVIDIA NeMo, and a broader ecosystem of speech-enabled applications.

As demand grows for real-time, accurate voice technologies, Granary’s open-source foundation may inspire similar initiatives beyond Europe. For now, its release provides the localization community with a dataset designed not only to scale technology, but also to reflect linguistic diversity.

NVIDIA Launches Open Dataset for Multilingual Speech AI

Canary and Parakeet: AI Models Built on Granary

Why It Matters for Localization

RELATED ARTICLES

Fabio Minazzi: The Gift of Voice

UK Screen Sector Embraces AI for Subtitling, Dubbing, and Dialogue

Five AI Advancements Shaping the Language Industry in 2024

Translated Leads Major European Initiative DVPS to Advance AI Beyond Language Models and Into the Physical World

Acolad scores landmark win with EU DGT

Weekly Newsletter, Subscribe to stay updated!

Login or Register