Artificial Intelligence

Three Rules for Integrating
Generative AI Into Translation Workflows

Insights from the Microsoft Globalization Team

By Bruno Lewin, Joe O’Brien, and John Wilcock

W

ith recent advances in large language models (LLMs), many organizations are investigating whether generative artificial intelligence (GenAI) can create natural-sounding target language at a lower cost and with shorter turnaround times when compared to existing machine translation (MT)-based workflows or even as a replacement for human translation. As leaders of the Microsoft Globalization Essentials team, we believe that adopting GenAI for translation is a forward-thinking approach that aligns with the latest advancements in technology. However, it’s essential to transition to this new process thoughtfully and incrementally, ensuring that it meets established benchmarks for each language before committing to full implementation.

In this article, we share advice for effectively and ethically using GenAI for translation and localization, as reflected in our recently updated and publicly available documentation on the Microsoft Learn website. Three rules are paramount as you determine whether and how to implement this emerging technology into your workflows:

  1. Consider the cost, speed, and quality when compared to alternative technologies.
  2. Use context-aware techniques to improve quality.
  3. Take steps to avoid ethical issues like cultural, gender, or other in-group bias.

Advertisement

Evaluate Risks, Costs, and Performance

When evaluating the case for shifting to GenAI-based translation, it’s crucial to consider factors such as:

  • risk management
  • output quality
  • total cost of ownership
  • performance
  • impact on people and processes

The transition to GenAI should be a step-by-step process, tailored to the specifics of each product, content type, market, language, and customer expectations. This approach allows for a balanced and justified move, especially in cases where the return on investment (ROI) might be minimal.

In terms of risk, GenAI-based translation carries a new set of challenges that require thorough human evaluation, particularly for sensitive applications that could affect the integrity or reputation of the brand. Special attention should be paid to new or updated terminology, and frequent spot-check validation of the LLM updates, as newer versions of the models might introduce degradation for some languages.

Quality control is variable across different languages. While GenAI-based translation has exceeded or matched the quality of traditional methods in some languages, it still poses significant challenges in others. The focus of the quality reviews should include two factors:

  • linguistic quality: the overall effectiveness and technical correctness of the target text
  • adequacy: faithfully conveying the meaning, style, and tone of the original text

Ensure that the text is appropriately written, maintains the linguistic quality required by your products, and is an adequate translation for the source. The latter is especially important since, as opposed to older MT approaches, LLMs can introduce fabrications or hallucinations. Fabrications are words or phrases that aren’t present in the source text but are generated by the model. The fabricated text might be factually correct, but it can also be incorrect or misleading, even when the text seems plausible.

Cost-wise, some of the latest GenAI models are slightly more cost-effective than their predecessors. However, the total cost of ownership, which includes both the operational and personnel costs, must be considered.

Enhance Accuracy With Context-Aware Techniques

An important consideration for translation is the context of the source text. Context can be as broad as the industry or the domain of the source text, or as specific as where a single word is used in a user interface (UI). Context is especially significant for short strings of UI text, where the meaning of a word or phrase might not be readily apparent. GenAI can struggle to accurately reflect context, but two new approaches can help: fine-tuning and retrieval augmented generation (RAG).

General-purpose LLMs were trained using public data that was available at the time, but this public data might not be sufficient to meet all your needs. Fine-tuning is the process of adapting a pre-trained model on additional data relevant to the translation scenario. For example, an LLM could be fine-tuned for optimized translation of medical terminology. To fine-tune an LLM for translation, you will first need a large dataset of bilingual text that can be used to train the LLM. By providing the expected translation for a large set of source text, the LLM will be more likely to generate output that reflects the training data. Existing translation memory might be an ideal dataset for training your LLM.

RAG is a pattern used in AI that uses an LLM to generate answers with your own data. For example, an LLM could be provided with the source code of a UI. When a source string for translation is combined with context provided by how each string is used in the UI code, the resulting translation is more likely to reflect the intended meaning of the source string.

Embrace Responsible AI Principles

It’s important that GenAI systems be built to provide a helpful, safe, and trustworthy experience for everyone around the world. Responsible AI practices are intended to keep people and their goals at the center of the design process and consider the benefits and potential harms that AI systems can have on society. These principles are:

  • Fairness
  • Reliability and safety
  • Privacy and security
  • Inclusiveness
  • Transparency
  • Accountability

These responsibilities require that you ensure that translations generated by LLMs accurately reflect the meaning of the source text and are free from cultural, gender, or other in-group biases. For example, an LLM trained solely on Castilian Spanish source materials might use vocabulary and idioms that are less common in Central and South America.

Responsible AI should involve mitigating the risk of this type of cultural bias. By partnering with your linguist team, you’ll be able to identify where your model isn’t meeting the needs of the target market. You’ll then be able to train your model to generate vocabulary and idioms that are appropriate for each market.

Advertisement

Conclusion

The advent of GenAI marks a pivotal moment for the translation and localization industry. As with any emerging technology, GenAI introduces new challenges and risks, and its full effect on the translation industry is yet to be determined. However, we posit that localization teams can use a measured approach to effectively and responsibly implement GenAI into translation processes where and how it makes sense for them.

While GenAI has the potential to boost productivity and efficiency, it also redefines the role of professional translators. As translation solutions evolve to integrate GenAI, we believe the expertise of human translators will remain essential to ensuring integrity and quality. Translators will continue to be critical in defining new terminology, evaluating the performance of LLMs, and ensuring that deliverables respect responsible AI practices. 

Bruno Lewin is a technical program manager at Centific who supports globalization, AI, and data at Microsoft. He previously worked in localization, engineering, compliance, and finance roles in Poland, France, Ireland, and the United States. Outside work, he is active in non-profits focused on education.

Joe O’Brien is an analytics program manager at Microsoft with many years of experience as a Japanese-English translator and localizer. He is also a linguaphile with a special interest in East Asian languages and language preservation.

John Wilcock is a program manager at Microsoft with a background in i18n and l10n. He leads the effort to update the Globalization Essentials content on Microsoft Learn. John also contributes to the Properties & Algorithms Group, a Working Group of the Unicode Technical Committee.

Advertisement

Related Articles