Bringing 200 languages to ModernMT: A Q&A with Translated’s Marco Trombetti

Translated, a Rome-based machine translation (MT) provider, announced a big milestone for its adaptive MT platform, ModernMT: The tool now supports a total of 200 different languages. Up from just 56 languages before the launch, the languages on this updated roster are spoken by more than 2 billion people across the world, a testament to the company’s efforts to allow everyone to be understood.

MultiLingual recently caught up with the company’s CEO, Marco Trombetti, to learn a little bit more about what it took to increase the number of languages offered on ModernMT by nearly four times what it previously was.

“We envision this effort as merely the first step,” he says. “While 200 languages may appear substantial, it is not an extraordinary figure. We are at the beginning, and we plan to refine adaptive MT support for these languages in the coming months, as well as for numerous others.”

Editor’s note: This interview has been edited slightly for clarity and house style.

Going from 56 languages to 200 is such a huge leap — could you tell me a little about your team’s work to add those 144 new languages to the platform?

To train adaptive models, it was essential to have a more substantial data pool than what non-profit organizations initially offered. This was achieved through a blend of data cleansing and synthetic data generation, particularly for the newly adopted languages.

The end of your announcement mentions that “for the first time ever, 30 new languages are supported in the market, leapfrogging directly to the most capable adaptive technology” — could you speak a little bit more about that?

There are some languages not supported by any of the other main MT providers. Google caters to 134 languages, Microsoft to 112, and Amazon to 75. In total, 30 languages that we now support are being served for the first time by a commercial entity. 

Moreover, by supplying adaptive models for these languages and backing prevalent CAT tools, we establish an environment where translators can swiftly enhance MT quality by rectifying initial errors.

What were some of the challenges you guys came across in bringing them to the market?

The two main challenges were the scarcity of data for some languages and the compatibility with the adaptive model architecture. There is still a considerable amount of work to be done. Numerous languages have been trained using insignificant quantities of data. Methods such as (Meta’s No Language Left Behind)’s cross-lingual learning offer a solution, but the associated high cost and latency make them less commercially practical.

Your announcement also mentions that your team worked with non-profits like Common Crawl and Opus on this expansion — could you tell me more about their role in the expansion? How did these nonprofits help your team expand ModernMT’s offerings?

Supporting 200 languages with adaptive MT would have been unattainable without Common Crawl’s extensive efforts in web information gathering or projects like Opus, which have curated public datasets for translations over the years. The open and transparent research conducted by Meta also made a significant contribution.

In the press release, you say that you believe this expansion will “help preserve many endangered languages” — how do you anticipate that this development will do that?

With this release, we not only provide MT for previously unsupported languages, but we also make it accessible through Matecat, our free-to-use CAT tool, and as a plugin for other popular CAT tools. By offering adaptive MT, we enhance MT quality for every language as it is employed.

Andrew Warner
Andrew Warner is a writer from Sacramento. He received his B.A. in linguistics and English from UCLA and is currently working toward an M.A. in applied linguistics at Columbia University. His writing has been published in Language Magazine, Sactown Magazine, and The Takeout.

RELATED ARTICLES

Weekly Digest

Subscribe to stay updated

 
MultiLingual Media LLC