Basque Language Finds a Digital Ally in Local AI Translation Tools

In Bilbao, Spain, a regional push for language preservation is meeting the latest advancements in artificial intelligence. At the heart of this effort is Vicomtech, a nonprofit research center focused on applied AI for speech and language technology. With support from both private entities and multiple levels of government, the organization is working to develop translation tools tailored to the unique needs of the Basque language.

One of Vicomtech’s key contributions is Itzuli, an automated translation tool that facilitates translations between Basque, Spanish, English, and French. Integrated into the Basque government’s website, Itzuli also supports formal legal translation and is expanding to include the Bizkaian dialect. Although Itzuli has less visibility than global platforms like Google Translate, it is used for approximately 300,000 translations daily, underscoring its role as a local solution rooted in linguistic nuance.

Adapting AI to Local Needs

Public broadcaster EITB is advancing AI-based translation technologies with a context-specific approach. The network combines human transcription, automatic subtitling, and hybrid systems depending on the content’s sensitivity. News programming, for instance, receives closer supervision, while other platforms, such as the audio service Guau and the news portal Orain, incorporate automatic translation into multiple languages using Itzuli.

The organization emphasizes that quality and accuracy are particularly critical in minority languages. Automated transcription may be suitable for general or entertainment content, but sensitive or official programming often requires human oversight to avoid linguistic errors.

A Corpus Built on Collaboration

Supporting these efforts is Euskorpora, a nonprofit consortium curating the Basque Language Digital Corpus. The project includes a wide range of Basque audio, text, and video materials, collected with full legal permissions. Designed for both academic and commercial use, the corpus represents multiple dialects and speech styles.

Instead of maximizing quantity, Euskorpora focuses on data quality, addressing challenges such as noisy audio and limited availability of technical language. The project also explores the cautious use of synthetic data to fill content gaps in fields like engineering and law.

This collaborative and careful development model highlights how regional initiatives can support digital inclusion while strengthening local language infrastructure.

MultiLingual Staff
MultiLingual creates go-to news and resources for language industry professionals.

RELATED ARTICLES

Weekly Digest

Subscribe to stay updated

 
MultiLingual Media LLC