I
t has been recently argued that one of the “faults” of large language models (LLMs) is that they don’t understand the output they produce. This is a misunderstanding of generative pre-trained transformers (GPTs), which were not designed to reason but to generate — that is, predict. These models don’t need to understand what they produce to generate human-quality language. Predicting the next token seems good enough to fool us.
Likewise, rule-based machine translation (MT), statistical MT, and neural MT (NMT) never “understood” what they were translating. They followed rules or probabilities. So why would we expect any type of translation system to “understand?”
Thus, the question is not should we use LLMs for translation but how do we make the best use of their contextual and cultural adaptation capabilities for localization while minimizing their true nature (that they’ve been designed to generate).
LLMs like GPT-4 have captured the public imagination. More open-source models are freely available and can be customized for specific tasks. They can generate human-like text, answer questions, and even write code. But can they truly bridge the gap between languages and cultures? Recent research by Pangeanic and others suggests that we might be closer to this goal than we think, despite the well-known limitations of LLMs.
The Challenges of LLMs
While LLMs have shown impressive capabilities in generating fluent text across multiple languages, they’re not without their flaws. As critics point out, these models can produce “hallucinations” — confidently stated but entirely fabricated information. They also lack common sense, often failing to grasp the nuances of context and culture that are crucial in human communication.
Dr. Pilar Orero, a computational linguist at the Universitat Autònoma de Barcelona (UAB) and frequent partner of Pangeanic in European Union (EU) research grants, explains, “LLMs are essentially sophisticated pattern recognition machines. They can produce grammatically correct and even idiomatic text, but they don’t truly understand the world in the way humans do.”
The Context Conundrum
One of the biggest challenges in cross-cultural communication is the varying importance of context across cultures. Anthropologist Edward T. Hall famously categorized cultures as high- or low-context, depending on how much of the meaning in communication is implicit versus explicit. This distinction presents a significant hurdle for MT systems, including those powered by LLMs.
In high-context cultures, such as those found in many Asian, Mediterranean, and Latin American countries, communication relies heavily on shared understanding, implicit meanings, and non-verbal cues. In these cultures, what’s not said is often as important as what is said. The context — the setting, the relationship between speakers, and shared cultural knowledge — carries a significant portion of the message.
Consider the Spanish phrase “lo vamos viendo” (“we will see to it as we go on”). To a native Spanish speaker, particularly from a Mediterranean culture, this seemingly simple phrase can convey a wealth of meaning. It suggests flexibility to challenge ahead, a reluctance to fully commit, and an openness to adapting plans as circumstances change. Orero notes that “it’s a perfect example of Mediterranean improvisation that might leave an American or Scandinavian bewildered.”
Low-context cultures — prevalent in countries like the United States (US), Germany, and Scandinavian nations — tend to communicate more directly and explicitly. Information is more likely to be conveyed in words rather than context. It is worth noting that even within the same language, American Spanish speakers tend to find European Spanish speakers very direct.
This cultural divide can lead to amusing, or frustrating, misunderstandings. Take the example of asking a passerby to take a photo of you. A Finnish person might simply reply “no” without offering any explanation or apology. While this direct response might be perfectly acceptable in Finnish culture, it would leave Southern Europeans stunned. Maria Angeles Garcia (Head of MT at Pangeanic) explains, “In Spain, even if someone couldn’t or didn’t want to take the photo, they would likely offer profuse apologies and explanations. The lack of these social niceties can make low-context communicators seem rude to those from high-context cultures.”
The differences extend beyond casual interactions. As Angeles Garcia and Pangeanic’s Head of Production Ángela Franco verified a few months ago in several business settings, these cultural communication styles can significantly impact negotiations and relationships. Imagine a business meeting in Japan versus one in the US. In Japan, a high-context culture, meetings are often filled with ceremonies and protocols. Direct discussion of the business at hand might be considered crude. Reading between the lines and understanding unspoken agreements are crucial skills.
Contrast this with a typical American business meeting, where participants are likely to “get down to business” quickly, explicitly stating goals, presenting data, and seeking clear agreements or disagreements within the meeting. What’s considered efficient and professional in one culture might be seen as rushed and impolite in another.
These cultural differences in communication style present a formidable challenge for MT systems. How can AI understand that “lo vamos viendo” might require a completely different translation depending on the context? How can it capture the unspoken elements of a Japanese business negotiation?
Angeles Garcia, who has extensive experience in NMT and multilingual MT, points out, “LLMs are trained on vast amounts of text data, which allows them to capture some of these cultural nuances. However, they still struggle with the deeper understanding required to consistently solve these complex cultural contexts.”
Even within a single culture, context can vary greatly depending on the situation. Orero explains, “All societies combine both types of communication. There’s no language that’s entirely independent of context for correct understanding of what it expresses.”
This complexity extends to non-verbal communication, as well. “Mediterranean societies might favor more gesticulation,” Orero notes, “while Nordic cultures might be less expressive gesturally, but perhaps much more subtle. Gestures or tones that might go unnoticed by a non-native speaker could be very revealing to a native.”
The challenge for LLMs and MT systems, then, is not just to translate words, but to translate entire cultural contexts. They need to reflect not just what is said, but how it’s said, why it’s said, and crucially, what isn’t said.
As we push the boundaries of AI in translation, addressing this context conundrum becomes increasingly important. The goal is not just linguistic accuracy, but cultural fluency — the ability to tackle the complex, often unspoken rules that govern communication in different cultures.
A New Approach: Deep Adaptive RAG-Based Automatic Post-Editing
Deep Adaptive, developed as part of a collaboration with Valencia’s Polytechnic Pattern Recognition and Human Language Lab, has long been Pangeanic’s NMT flagship. However, Pangeanic’s most recent research offers an even more promising solution. As part of a national research project, our team has developed a system that combines the power of LLMs with specific retrieval-augmented generation (RAG), vector databases, and agentic verifiers to create more accurate and culturally appropriate translations.
The system works as follows:
- An initial translation is generated using a state-of-the-art NMT model.
- This translation is then processed by a RAG system, which uses vector databases to retrieve relevant context, terminology, and style information.
- Finally, an LLM, fine-tuned for post-editing (PE), refines the translation based on the retrieved information.
Angeles Garcia is the lead researcher on the project, with Franco ground testing every step. They explain, “Our system doesn’t just translate words; it translates context. By leveraging vast databases of domain-specific knowledge and cultural information (what the user deems relevant), we can produce translations that are not only accurate, but also culturally appropriate.”
The Workflow: A Three-Step Process
The system works through a sophisticated three-step process:
- Initial Translation: The source text is first translated using a state-of-the-art NMT model. For this study, the team used a heavily fine-tuned version of Meta’s “No Language Left Behind” model, which supports over 200 languages. “We fine-tuned the model on 20 million words of carefully selected, reviewed, and cleansed data from our proprietary repository,” Angeles Garcia notes. “This gave us a strong baseline that already had some domain-specific knowledge.”
- Context Retrieval: The initial translation is then processed by a RAG system. This step is crucial for addressing the context conundrum. Orero — who wasn’t involved in the study but has reviewed the findings and will be using them in an EU project for the audiovisual sector — explains, “in essence, the RAG system uses vector databases to retrieve relevant contextual information, terminology, and style guidelines. It’s like giving the translation system access to a vast library of cultural and domain-specific knowledge, which is very important in multilingual automatic subtitling, for example.” The vector database used in this study contained 100,000 terminology pairs and style examples for each language pair, covering domains as diverse as healthcare, software, journalism, marketing, law, hydrology, public administration, and automotive industries.
- PE Refinement: Finally, an LLM that is fine-tuned specifically for PE tasks refines the translation based on the retrieved information. “This is where the magic happens,” Franco adds. “The LLM doesn’t just substitute words based on the retrieved information. It uses its deep understanding of how a language works to seamlessly integrate the contextual nuances, terminology, and style guidelines into a fluent, culturally appropriate translation.”