You won’t unlock the full potential of AI with current TMs

Supported by Translated

Data curation is crucial for enhancing translation quality and tackling efficiency and scalability challenges. To address these needs, Translated developed TranslationOS, the future-proof solution for AI-first localization.

Why has the focus on data risen to such prominence in localization circles?

“With the expanding reach of AI, professionals are honing in on the essential elements that ensure the success of AI projects. There is a consensus among experts that data quality is the fundamental factor influencing the performance of AI systems. The best-performing AI applications are driven by high-quality data, which produces better results. Traditionally, the data emphasis has been on translation memory (TM), which is beneficial for MT. Still, newer AI capabilities can leverage more comprehensive contextual and reference data to yield outputs of superior quality and subtlety. Access to high-quality, contextually rich data markedly improves the performance of MT engines within the Translated ecosystem, which shows progressive improvement over time. Identifying and organizing a broader, more influential dataset (beyond TM) are critical to generating optimal results with the most advanced and powerful Language AI technologies. Hence, the focus on data curation and advanced data management is becoming increasingly prominent, paving the way for a new era of productivity that allows businesses to scale effectively while preserving quality.”

What are the limitations of Translation Memory and Legacy TMS systems in an AI-first world?

“Translation Memories store previous translations and present them for use with new segments, but they lack metadata and don’t adapt to new contexts or learn from feedback unless manually updated. These systems overly concentrate on isolated translation segments, neglecting the wider document context, tone, and stylistic subtleties. This results in inefficiencies, inconsistencies, and avoidable errors, particularly in multi-vendor environments, necessitating substantial corrective actions. Accessing and sharing TM data with other AI tools is difficult due to designs optimized for segment matching within a single proprietary system.

In contrast, the AI-first approach within TranslationOS is designed to continuously learn and adapt, comprehending specific context and tone of documents, and incorporating previous corrections. The dynamic and flexible curated data infrastructure within TranslationOS can be repurposed with new Language AI, quickly and efficiently delivering quality improvements with these emerging technologies. AI outputs improve daily with minimal management, and the more they are used, the faster the quality and efficiency are enhanced.”

What is needed to create an effective data curation infrastructure in a translation production environment?

“Establishing a robust data curation infrastructure is crucial for enhancing AI results, necessitating a holistic approach that looks beyond typical TMS data. Contextual and situational background data, which expert translators find valuable, are also crucial for AI systems. It’s not only about having a translation memory; it’s about having access to collaborative inputs from related past projects, thorough document contexts, glossaries, style guides, and accurate labeling.

Data curation’s essence lies in compiling this broad contextual dataset and formatting it effectively for AI utilization. The more detailed and relevant the contextual data, the better the AI will perform. This data must be continuously cleaned and updated, to support ongoing production efficiency. Meticulous gathering and standardization of expert feedback assure consistency, and ongoing quality monitoring ensures long-term improvement. Promptly identifying and correcting errors in production workflows guarantees a constant supply of fresh, high-quality, validated data for continuous model refinement.”

Kirti Vashee is a Language Technology Evangelist at Translated and was previously an Independent Consultant focusing on MT and Translation Technology. He was formerly associated with several MT developers including the original Language Weaver (SMT), RWS/SDL, Systran, and Asia Online. He has extensive experience with many large-scale MT initiatives over the last 15 years and is the moderator of the Automated Language Translation (MT) group with 14,000+ members on LinkedIn and is also a former board member of AMTA (American Machine Translation Association).

BACK TO ISSUE

Research from Translated projects that MT is nearing human-like levels of performance

By Andrew Warner

By observing the edits made to MT output by the 136,000 highest performing freelance translators on Matecat — the company’s computer-assisted translation tool — Translated found…

→ Continue Reading

St. Peter’s Basilica Delivers Holy Mass in 60 Languages Through Lara Interpreter

By MultiLingual Staff

The system will allow attendees to follow the Holy Mass by Pope Leo XIV in 60 languages via their smartphones, with no app download or…

→ Continue Reading

Translated Introduces Free Adaptive MT in Matecat to Increase Linguists’ Productivity

By MultiLingual Staff

The update integrates ModernMT's adaptive neural machine translation (MT) model as the default MT engine, available for free to all logged-in Matecat users.

→ Continue Reading

News
Localization
M&A
Business
Culture
Perspectives
Interpreting
Press Releases
Sponsored
Technology

Weekly Digest
Subscribe
Submit News

General Information info@multilingual.com

Subscription subscriptions@multilingual.com

Advertising
advertising@multilingual.com

Editorial Questions editor@multilingual.com

Privacy Policy

General Information
info@multilingual.com

Subscription
subscriptions@multilingual.com

Advertising
advertising@multilingual.com

Editorial Questions
editor@multilingual.com

Privacy Policy

sponsored content

You won’t unlock the full potential of AI with current TMs

Supported by Translated

Why has the focus on data risen to such prominence in localization circles?

What are the limitations of Translation Memory and Legacy TMS systems in an AI-first world?

What is needed to create an effective data curation infrastructure in a translation production environment?

Research from Translated projects that MT is nearing human-like levels of performance

St. Peter’s Basilica Delivers Holy Mass in 60 Languages Through Lara Interpreter

Translated Introduces Free Adaptive MT in Matecat to Increase Linguists’ Productivity

Login or Register