Large language models (LLMs) have made remarkable strides in producing fluent and accurate translations across dozens of languages, but when it comes to cultural nuance, they’re still missing the mark. A new study by Appen, a global leader in data for AI, reveals that even the most advanced multilingual models struggle to localize marketing content effectively, especially when subtle tone, humor, or idiomatic expression is involved.
The gap between translation and localization
The report, titled “Multilingual AI and Cultural Nuance: Evaluating Localization Performance of LLMs,” offers insights into how current models perform across 23 languages — from Spanish and Japanese to Gujarati and Igbo. While the models generally succeed at literal translation and grammar, they consistently fall short when assessed for localized relevance and emotional resonance.
Appen’s researchers focused on marketing copy that requires a high degree of cultural sensitivity, including wordplay, slogans, and figurative language. The team found that while most LLMs could technically render the content into another language, they often stripped away its original tone, failed to carry over humor, or produced messaging that sounded awkward or confusing in the target culture.
New metrics for a more nuanced AI
To address this shortfall, the study proposes a new framework for evaluating AI translations that goes beyond merely rewarding grammatical correctness also to assess tone fidelity, intent preservation, and cultural alignment. Human reviewers played a key role in identifying when LLM outputs sounded “off” — even when the grammar was flawless.
“Cultural nuance isn’t optional — it’s the difference between being understood and being ignored,” the report notes. “If we’re serious about global communication, we need models that adapt beyond words.”
Implications for global content creators
For companies relying on LLMs to scale content globally, the message is clear: these tools are powerful but incomplete without human insight. Appen recommends pairing AI workflows with human validation to ensure that localization retains both the message and the emotion behind it.
As generative AI continues to shape the future of multilingual communication, Appen’s findings serve as a timely reminder: translation is not localization, and cultural nuance still requires a human touch.

