The localization industry has seen many types of customized MT: models trained on bilingual corpora, glossaries, adaptive MT, and now prompting-based MT through LLMs. What do you think is the best approach to customized MT?
Bureau Works (founder and CEO Gabriel Fairman): The best approach is what we call “context sensitivity,” which uses LLMs’ analytical and predictive capabilities. We work with a retrieval-augmented generation (RAG) framework that examines the text and looks for relevant context in the translation memories (TMs), glossaries, MT repositories, work unit, and preferences. After we retrieve the context, we have a dynamic system that ranks this context according to relevance using a wide range of metadata, including author, creation date, times confirmed in the past, and semantic plausibility. We then feed this context a cluster of LLMs that work as arbitrators to suggest the most likely outcome of all of this context. This suggestion then goes through a formatted filter and is returned to the translation editor. This is the approach that is most likely to create a translator digital twin and is therefore most dynamic and effective. It’s also easy to scale and manage, as all knowledge is stored in TMs and glossaries and does not require fine-tuning instances.
Translated (Tech Evangelist Kirti Vashee): The business objective of using translation technology that enables an enterprise to be multilingual at scale is to improve the global customer experience and drive international market success. The technology used needs to be scalable, responsive, reliable, and cost-effective, all while producing high-quality output across a large number of language combinations. Adaptive MT technology has shown itself to be the most capable enabling technology to date. More recently, we’ve seen evidence that LLMs — if properly implemented — can improve fluency and raise the overall quality for a subset of languages. However, we have yet to see this scale to the other production needs mentioned above.
We anticipate that, soon, LLMs will become a viable enterprise solution for translation. This will likely come when we move towards task-specific LLMs trained specifically for translation. These models will be smaller and more practical to deploy and maintain than today’s massive foundational models.
In the interim, both LLMs and classical MT approaches may be useful in parallel. Still, most enterprises would likely prefer a single integrated solution unless there are significant advantages for key languages by using two different production pipelines.
In general, the choice of technology will always be secondary to the positive measurable impact on global customers. MT quality differences must be balanced with latency, throughput, and cost realities. The preferred solution will probably be the technology that provides a reliable, consistent, high-quality, and cost-effective deployment in production scenarios.
LILT (VP of Growth Allison Yarborough): LILT combines all of these (and we believe this approach is best); we train on bilingual corpora for adaptive MT on both TMs and online while the translators are working, we utilize glossaries in the translation algorithm, and we integrate translation samples akin to LLM prompting into the MT system. Each method has its advantages and disadvantages, but we found that the combination provides the best results.
Lionbridge (CTO Marcus Casal): While there is no one-size-fits-all approach to customized MT, new methods exist to improve its results. Traditionally, customization involved training a base model for specific brands, domains, or other use cases, but there was limited demand for this level of specificity. With the rise of LLMs, we’re identifying a new approach: using the LLM to improve the output of a base MT engine rather than customizing the engine itself.
Essentially, through a well-tuned, strategic prompt flow, we can prompt the LLM to check the quality of the translation and refine it based on specific requirements like glossaries and audience. We have found a lot of value in a two-step process that combines baseline MT engines with highly targeted LLM prompting strategies to achieve both accuracy and fluency in customized translation. And, of course, this is a prompt flow with iterative prompting ranging across personas and source/target/bilingual language to achieve the desired outcome.
memoQ (Chief Evangelist Florian Sachse): Promoting based MT through LLMs combines a strong language encoding (structure, grammar, tone of voice) with a high level of relevant domain information, which can be provided in a prompt. LLMs will still need to improve for certain languages by providing more data. But for many first-tier languages, LLMs can generate fluid and grammatically correct content. Increasing the correctness of the generated content will depend on prompt engineering and the context information, which, in our case, typically comes from TMs and terminology. Improving the translation quality will not work through retraining the LLM but by improving the prompt, which is much more predictable and controllable. If predictability and repeatability (continuous workflows) are key, this is the most efficient approach.
Pangeanic (founder and CEO Manuel Herranz): In 2024, the best approach to customized MT continues to be NMT. We have achieved a level of parallel corpora availability that allows for the creation of MT engines at very economic costs. It scales well, and adaptation can happen in several ways. At Pangeanic, we provide the ability to inject data to a baseline model with three levels of aggressivity, which customizes models in minutes. Other companies do it “on the fly” — a very attractive concept, but also a way to accumulate and propagate “on the fly” errors. Serious and professional workflows always require a human verification of the TMX file before it is injected into the adaptive NMT engine for retraining. NMT is much cheaper to run than LLM-based translation, as well. It is more “controllable” for specific objectives, such as ecommerce, subtitling with a lot of conversational expressions, software, and healthcare.
Prompt-based translation is proving very popular, and it has advantages and disadvantages. The largest disadvantage is the lack of control in the output. Let’s not forget that LLMs are generative AI (GenAI). In science and engineering, we are used to having the same results if we apply the same formula. Well, we all know that asking the same question to an LLM does not necessarily guarantee the same translation result. That’s not bad if you have occasional translation needs like translating an email. But try to incorporate LLM-based translation consistently at scale while fully respecting terminology and styles, and the LLM seems to have a mind of its own.
All independent MT companies, as well as TMS companies, are working to incorporate GenAI into their workflows, but with no guarantee or customization. Prompting is not enough. There is a temptation to assume that, after getting 10 results right, all translations are going to be fine and that LLM-based translation will work just like NMT translation does. It doesn’t.
We have tested pure prompt-based LLM translation. Unless you have a specific model trained for the translation task, clever and tried prompting, and an established workflow, it will generate free versions and not “accurate” translations. In short, models trained on bilingual corpora and glossaries are very effective, and relevant and sufficient data is widely available — at least in major languages. Adaptive MT can further enhance the quality if there is sufficient and regularly updated training data.
However, prompting-based MT using LLMs offers more natural and contextually relevant translations, especially when domain-specific training data is limited or non-existent. LLM translation is great for off-the-cuff Japanese <> Spanish or Polish <> Mandarin. I do see the value there.
So, how long will we hold on to NMT? Not long, I dare say. I envisage GenAI systems that, at a similar or higher cost, offer a lot more automation from a single application programming interface (API) connection, benefitting from GenAI’s fluency and post-editing (PE) in context at scale.