Why Compare Adaptive With Generic MT?
“Selecting an MT service often involves private testing, which is time-consuming and requires in-depth knowledge of the MT systems being tested, and relying on third-party analysis, which focuses mainly on comparing leading generic (or static), publicly available MT systems. This doesn’t take into account ModernMT’s unique ability to instantly adapt to specific content without any training.
This is why we asked Polyglot Technology to conduct an independent study that evaluates and compares out-of-the-box MT systems. The results further demonstrate that ModernMT’s adaptive model, with access to a small translation memory but without additional training, provides an unparalleled level of accuracy and context awareness right out of the box, which static models simply can’t match without additional effort.”
How Is This Study Unique?
“Companies can easily replicate the study using the available MT solutions with their own content. The study was based on publicly available evaluation and comparison scripts to translate a public dataset from Autodesk from US English to German, Italian, Spanish, Brazilian Portuguese, and Simplified Chinese. It explores a typical example where the generic baseline needs to adapt immediately to enterprise domain content to be useful. It also focused on understanding the MT system’s ability to handle different languages, contexts, and specialized terminology, providing a direct comparison of these tools in typical translation workflows.
Polyglot Technology’s research employed commonly used quality measurement metrics (COMET, TER, and SacreBLEU) and tested the main public MT systems (Amazon Translate, DeepL Translator, Google Translate, and Microsoft Translator) against multiple “no-effort” ModernMT models (static, adaptive, adaptive with access to an Autodesk TM of 10,000 segments).”
What About LLMs?
“Translated also ran OpenAI’s GPT-4, the state-of-the-art large language model (LLM), through the same evaluation tests and quality assessment and found that GPT-4 consistently performed worse than any of the other leading neural MT services tested. In our experience, LLMs perform best when translating content with a complete document and clear context. This isn’t the case in many MT use cases, like sending individual user interface components for translation.
LLMs also require sophisticated fine-tuning and prompt-driven modifications to even attempt to address enterprise domain optimization. Nevertheless, we expect LLMs to play an important role in the evolution of MT.”