Artificial Intelligence

Mitigating Hallucinations
in AI-Powered Translation

By Olga Beregovaya and Alex Yanishevsky


A

rtificial intelligence (AI) has revolutionized translation, providing high-quality output at unprecedented speed and scale. Today’s AI-powered translation tools convert speech, text, and images into different languages in seconds, enabling global communication and collaboration like never before. But, these massive capabilities come with massive shortcomings.
 
Human communication is complex and ever-evolving. Our languages encompass more than words — they require a deep knowledge of tone, context, emotions, intent, and shared experiences. Although generative AI-powered translation systems are amazingly fluent, they sometimes struggle with relevance and accuracy when it comes to language specificities such as voice, tone, cultural nuance, and even verifiable facts.
 
When it comes to translation, large language models (LLMs) can — and do — make mistakes. One of the biggest challenges in AI translation is the presence of hallucinations. Unlike simple mistranslations, hallucinated outputs introduce false information by generating details that were not present in the original text. What’s more, these hallucinations usually appear credible and are delivered with the same confidence as accurate outputs.
 
Even minor inaccuracies in translation can have major consequences. False fluency — where the text sounds natural in the target language but is actually wrong — can lead to misunderstandings, misinformation, and, ultimately, distrust. In extreme circumstances, it can even result in legal proceedings.
 
To realize the full potential of AI translation tools, we must first recognize their pitfalls. The good news is that there are many practical ways to detect and mitigate AI hallucinations in translation. This article delves into the reasons behind the problem and the techniques that can be used to address it.

Advertisement

What Causes AI Hallucinations?

Unpredictable model behavior in language translation can be caused by a mix of technical limitations in models’ architecture, noisy training datasets, vague or ambiguous source inputs, unstable model decoding parameters, and the simple fact that generalized foundational models were not initially built for translation tasks; the ability to perform translation is an ancillary benefit. Due to predominantly English-centric training data, there is a higher risk of AI hallucination for languages other than English, especially under-resourced languages and languages that are both complex and linguistically distant from English (such as Estonian or Turkish).
 
Different translation models have different blind spots. Neural machine translation (NMT) models excel at delivering actionable, consistent, and predictable translations, especially in well-supported languages with abundant training data. Through continuous re-training and adaptation, purpose-built NMT models also tend to gradually produce more relevant on-brand translations. However, they tend to be more limited in language fluency and understanding of the context present in the source language. On the other hand, LLMs leverage diverse datasets to perform a multitude of tasks, from coding to search and document summarization, with translation being just one of many tasks. While LLMs have the ability to incorporate rich, cross-domain knowledge, they carry a higher risk of distorting the meaning of source material.
 
Both types of systems rely on tokens, rather than full words, to process text. These tokens can be words, subwords, or even punctuation. Tokenization can be a fundamental constraint, even with large context windows. For example, due to the absence of word delimiters, character-based tokenization of Asian languages increases the risk of translation unpredictability and model hallucinations.
 
Model decoding parameters, such as temperature, lend further variance and unpredictability to generated outputs. Temperature, which balances creativity with predictability, can lead to increased AI hallucinations. Higher temperature settings use more randomized tokens to predict and generate text. This can produce more natural-sounding translations, but risks going too far by introducing errors or even completely fabricated details while producing “false fluency.”
 
Non-deterministic AI models can give different outputs from the same input; it’s impossible to tell if a translation prompt will work by just reading it. In LLM-based translation, surface accuracy can be misleading: A prompt with spelling mistakes or logical gaps might still yield a fluent output, while a carefully detailed prompt can fail unexpectedly. Variations in domain or tone further increase the risk of hallucinations, producing inconsistent or unreliable results.

How to Mitigate AI Hallucinations

Deploying practical mitigation strategies can reduce the risk of hallucinations, enhance linguist productivity, and optimize overall translation quality.
 
One of the simplest ways to identify hallucinations is by running a mechanical, rule-based check as a boundary. Much like a conventional spelling or grammar checker, this automated verification process reviews structural issues — such as word count, punctuation, spelling, and terminology — and quickly flags potential hallucinations. Take the source-to-target length ratio as an example. If an input is 20 words and the resulting output is 200 words, there is probably a hallucination in the output — otherwise, where did the extra words come from? A drastic change in word count often indicates hallucinations or unnecessary additions that require automated correction or, even better, human intervention.
 
Another strategy involves applying machine learning-based techniques, including reviewing semantic similarity and lexical accuracy with both modern-state LLMs and state-of-the-art embedding models such as “text-multilingual-embedding-002” in Google’s Vertex AI Platform. It’s also possible to experiment with more advanced approaches such as semantic entropy algorithms and log probability analysis. This meaning-based approach analyzes ambiguous phrasing and inconsistent outputs, which can indicate a potential hallucination. The higher the semantic dissimilarity, the higher the probability that something is off with the translation.
 
The following five approaches have also proven successful at mitigating hallucinations.

Advertisement

Properly structured prompts

It sounds obvious, but AI prompting techniques that reinforce fidelity mean that AI translation sticks to the source material instead of trying to fill in the blanks. Be as specific as possible, providing the model with explicit instructions. For example, safeguard instructions in the form of an LLM prompt could be: “If you’re not certain, reply with ‘I don’t know’ as part of the reply.” This is especially critical for domain-specific content, such as legal, medical, or technical documents.

Proper model for the task

There are diverse sets of models available for AI translation. Their size, cost, speed, and quality can have dramatic impacts on your ability to translate at scale. Understanding the languages supported, the optimal prompting styles, their fundamental ability to perform specific linguistic tasks, the prompt size you can execute before losing quality, the appropriate parameters setup, and the quality versus cost tradeoffs of the various models is all necessary to optimize the AI translation flow.

Retrieval-augmented generation (RAG)

RAG technology combines industry-leading AI language models with retrieved knowledge sources to provide context. RAG enables translations based on original linguistic assets, such as translation memory matches, style preferences, available glossary terms, and more. The result is richer, more accurate translations based on your preferences.

Model as a judge

The “model as a judge” approach uses LLMs to evaluate another model’s outputs. This practical mitigation strategy implements a self-healing loop to correct identified issues through automated post-editing and smoothing.

Human in the loop

While AI brings speed and consistency to translation, human reviewers bring unmatched creativity, emotional intelligence, cultural knowledge, and critical-thinking skills. With AI-powered translation tools, subject-matter experts can verify machine outputs and correct mistakes.

Advertisement

Towards Better AI Translation

The need for fast, accurate multilingual translations spans all industries. And with stakeholder safety, trust, and finances on the line, bad translations are not an option.

At least with current AI technology, hallucinations can’t be completely eliminated; until AI models have enough data and real-world knowledge, errors are inevitable. This is especially true for lower-resourced languages — such as many Indic, African, and long-tail Slavic languages — which remain at higher risk of hallucinations and inaccuracies due to insufficient lexical and corpora coverage.

Overcoming these challenges will require new resources, diverse datasets, fine-tuned models, and output validation that is impossible with machines alone. The future of high-fidelity translations will combine proactive mitigation and human oversight to bring about seamless human-model collaboration. As AI translation models evolve, human oversight is — and will continue to be — essential to detecting hallucinations, correcting inaccuracies, and creating meaningful connections across cultures.

Olga Beregovaya is Vice President of AI at Smartling. She has more than 25 years of experience in natural language processing, machine learning, AI model development, and global content delivery. Olga serves as a Technology Program Sponsor for Women in Localization.
Alex Yanishevsky is Senior Director of AI Solutions at Smartling. His areas of expertise include AI, natural language processing, machine translation, data mining, and computational linguistics. He has written numerous articles for industry journals and has presented at industry conferences.

Related Articles