Evaluate Risks, Costs, and Performance
When evaluating the case for shifting to GenAI-based translation, it’s crucial to consider factors such as:
- risk management
- output quality
- total cost of ownership
- performance
- impact on people and processes
The transition to GenAI should be a step-by-step process, tailored to the specifics of each product, content type, market, language, and customer expectations. This approach allows for a balanced and justified move, especially in cases where the return on investment (ROI) might be minimal.
In terms of risk, GenAI-based translation carries a new set of challenges that require thorough human evaluation, particularly for sensitive applications that could affect the integrity or reputation of the brand. Special attention should be paid to new or updated terminology, and frequent spot-check validation of the LLM updates, as newer versions of the models might introduce degradation for some languages.
Quality control is variable across different languages. While GenAI-based translation has exceeded or matched the quality of traditional methods in some languages, it still poses significant challenges in others. The focus of the quality reviews should include two factors:
- linguistic quality: the overall effectiveness and technical correctness of the target text
- adequacy: faithfully conveying the meaning, style, and tone of the original text
Ensure that the text is appropriately written, maintains the linguistic quality required by your products, and is an adequate translation for the source. The latter is especially important since, as opposed to older MT approaches, LLMs can introduce fabrications or hallucinations. Fabrications are words or phrases that aren’t present in the source text but are generated by the model. The fabricated text might be factually correct, but it can also be incorrect or misleading, even when the text seems plausible.
Cost-wise, some of the latest GenAI models are slightly more cost-effective than their predecessors. However, the total cost of ownership, which includes both the operational and personnel costs, must be considered.
Enhance Accuracy With Context-Aware Techniques
An important consideration for translation is the context of the source text. Context can be as broad as the industry or the domain of the source text, or as specific as where a single word is used in a user interface (UI). Context is especially significant for short strings of UI text, where the meaning of a word or phrase might not be readily apparent. GenAI can struggle to accurately reflect context, but two new approaches can help: fine-tuning and retrieval augmented generation (RAG).
General-purpose LLMs were trained using public data that was available at the time, but this public data might not be sufficient to meet all your needs. Fine-tuning is the process of adapting a pre-trained model on additional data relevant to the translation scenario. For example, an LLM could be fine-tuned for optimized translation of medical terminology. To fine-tune an LLM for translation, you will first need a large dataset of bilingual text that can be used to train the LLM. By providing the expected translation for a large set of source text, the LLM will be more likely to generate output that reflects the training data. Existing translation memory might be an ideal dataset for training your LLM.
RAG is a pattern used in AI that uses an LLM to generate answers with your own data. For example, an LLM could be provided with the source code of a UI. When a source string for translation is combined with context provided by how each string is used in the UI code, the resulting translation is more likely to reflect the intended meaning of the source string.
Embrace Responsible AI Principles
It’s important that GenAI systems be built to provide a helpful, safe, and trustworthy experience for everyone around the world. Responsible AI practices are intended to keep people and their goals at the center of the design process and consider the benefits and potential harms that AI systems can have on society. These principles are:
- Fairness
- Reliability and safety
- Privacy and security
- Inclusiveness
- Transparency
- Accountability
These responsibilities require that you ensure that translations generated by LLMs accurately reflect the meaning of the source text and are free from cultural, gender, or other in-group biases. For example, an LLM trained solely on Castilian Spanish source materials might use vocabulary and idioms that are less common in Central and South America.
Responsible AI should involve mitigating the risk of this type of cultural bias. By partnering with your linguist team, you’ll be able to identify where your model isn’t meeting the needs of the target market. You’ll then be able to train your model to generate vocabulary and idioms that are appropriate for each market.