Conversations with Character(s)
Of the world’s approximately 7,000 languages, around 4,000 are written. Though many of us, certainly those of us from secure linguistic communities, take the act…
→ Continue ReadingI
started in game production in 1998 and worked on AAA titles including Commandos, TimeSplitters 2, and Who Wants to Be a Millionaire? Today, I spend much of my time in conversations with studio leaders, producers, engineers, and localization teams across the industry, which gives me a close view of how the same operational challenges continue to repeat under new layers
of technology.
Across more than 25 years of watching this evolution firsthand, I have seen the tooling change dramatically while production pressure only increased. Yet the same failures still show up: missing context, broken handoffs, and testing that happens too late.
The rush toward artificial intelligence (AI) today often feels like a new version of the same old chaos — only now the chaos can scale
much faster.
When I first stepped into game production in 1998, localization was still very much the Wild West. Translatable text often lived directly inside source code, mixed with logic, variables, and implementation details that were never designed for linguists to touch. If a studio wanted to localize, somebody had to manually dig through those files, extract the relevant strings, and somehow preserve enough structure so they could be translated and safely returned to the build.
Over the years, the infrastructure improved dramatically. The arrival of Unity and Unreal Engine changed the baseline for how studios could manage multilingual content. String IDs became a far more reliable backbone for moving content safely between the game and the localization workflow. Instead of extracting text from source code, teams could export structured localization files and let the engine dynamically pull the correct language based on platform or regional settings.
The workflow evolved further with cloud collaboration, APIs, and centralized content systems. Instead of emailing spreadsheets back
and forth, content could now be pushed directly from the development environment into a shared workspace, translated in parallel, reviewed with version visibility, and safely pulled back into the build. The process became faster, more visible, and less dependent on manual file handling.
Shipping changed just as dramatically. I still remember driving Gold Master discs across the United Kingdom in the middle of the night to hit manufacturing deadlines. Today, digital distribution and day-one patches allow studios to continue refining content even after release. From a tooling perspective, the industry is unquestionably stronger than it was 25 years ago.
Yet stronger infrastructure did not remove production pressure. If anything, it increased expectations. Teams are now shipping into more markets, across more platforms, with more simultaneous releases and far more live content updates than we ever imagined in the late 1990s. The speed improved, but the pressure never left.
What continues to surprise me is how much of the operational risk still looks familiar. The tools changed, but the weak points often did not. I still see studios struggling with duplicate string IDs, content branches that lose their relationship to screenshots or voice assets, item names hardcoded too late to localize, and dialogue trees that expand faster than teams can realistically test them.
Spreadsheet culture also still survives in places where teams feel safer managing complexity through manual visibility rather than structured integration. While understandable, this often creates version drift, duplicated effort, and unclear ownership. The tooling may look modern, but the operational discipline underneath it is often still carrying habits from a much earlier time.
The mindset challenges remain as well. There are still teams that assume English-first is good enough or that localization can wait until core development decisions are already locked. Tutorial flows, monetization prompts, live events, and seasonal content still continue changing until the final moment, forcing localization and quality assurance into compressed timelines. What changed is not the existence of complexity but how expensive those same mistakes become at scale.
This is where AI enters and where recurring workflow problems that studios have managed for years begin to move at a different speed.
Modern games are no longer static products but living systems shaped by branching dialogue, voice updates, player-generated variables, seasonal events, downloadable content (DLC) expansions, and cross-platform releases — each layer adding new ways for meaning to drift if it is not handled carefully.
Studios often focus on whether AI output is good enough, but that question misses the real issue, which is whether the surrounding workflow is capable of carrying that output safely and consistently into production.
AI acts as an accelerator. When context is missing, when terminology exists only in someone’s head or scattered across old
spreadsheets, and when project knowledge has never been structured or centralized, AI does not compensate for those gaps. It moves through them faster and at greater scale, allowing issues that were once contained to propagate across tens of thousands of strings before they are noticed.
This is why context is often underestimated in practice. Sometimes it exists only in a generic form, such as broad tone-of-voice notes that do not help a model make decisions in specific situations. More often, the relevant context is available somewhere, but assembling it requires manual effort across disconnected systems, which rarely happens consistently under production pressure.
The conversation, then, shifts away from speed alone and toward whether workflows are strong enough to preserve meaning, structure, and player trust as content moves through increasingly complex systems.
To help studios think through this more clearly, I use a framework called WAVE. WAVE stands for Workflow, Augmented context, Validation, and Expansion, and the order is intentional to describe the maturity stages that separate lasting value from expensive lessons.
The most common mistake I see is studios trying to introduce AI into something they call a workflow, but which in reality behaves more like a collection of disconnected habits and workarounds.
Files move through email, updates are tracked in scattered comments, vendors operate in one tool while internal reviewers rely on another, and the supporting context — from assets and reference materials to metadata — is often spread across different locations as well. There is no single source of truth or shared visibility into what has changed, what has been approved, and what is still waiting.
When AI is introduced into this kind of environment, the existing issues do not stay contained but begin to multiply in ways that are harder to control. AI does not handle ambiguity particularly well because it depends on clean, structured input. So when the source content is fragmented, incomplete, or poorly structured, the output reflects that same fragmentation — only faster and across every language at once.
Before a studio can use AI in a meaningful and reliable way, it needs to reduce the manual handoffs that slow coordination and introduce inconsistencies. Content, translations, metadata, and project history need to exist in one connected system, where updates can flow automatically instead of being passed manually between people and tools, and where linguists and developers can work from the same up-to-date information.
I have had this conversation many times over the years, and it consistently leads back to the same realization. When content is managed in a structured and centralized platform with real-time collaboration and full version history, the localization team is no longer chasing updates but instead working from a live and accurate view of what the game actually contains.
That shift in visibility alone removes a significant amount of manual effort and rework, which otherwise continues to drain both timelines and budgets.
Workflow is not just one step in AI adoption but the foundation that everything else depends on, and without it, AI does not deliver sustainable value. What it delivers instead is speed applied to an already unstable system, which simply results in faster chaos. This is the problem Gridly was built to solve — bringing content, translations, metadata, and workflow into one connected environment so AI has a stable foundation from which to operate.
Once the workflow is structured correctly, the next stage shifts toward what actually goes into the prompt, which is where most off-the-shelf AI translation approaches begin to fall short, especially in the context of game localization.
A game is not a generic document but a layered experience with characters who have distinct voices, in-world terminology that potentially carries meaning only within that universe, and such constraints as style rules, user interface (UI) length limits, and tone expectations that shift depending on who is speaking and to whom.
When an AI model operates without access to that depth of context, the output may appear linguistically correct on the surface but still feel disconnected from the game itself, because it lacks the nuance that defines the player experience.
The limitation is not in the model but in the context it receives. A natural starting point when first adopting AI is to run a single, consistent prompt across the full content batch, applying the same instructions across thousands of strings without awareness of character roles, UI constraints, or tone variations tied to specific situations.
Augmented context changes this by assembling the right information for each piece of content individually, drawing from glossaries and translation memories built over time while also incorporating project variables, style guides, character descriptions, and tone-of-voice references at the segment level, rather than treating everything as a uniform batch. Instead of relying on a single static prompt, it dynamically composes the relevant context for each content unit using the structured data already available in the platform, ensuring that the model receives instructions that reflect the content it is actually processing.
The impact of this shift becomes clear in the output quality, as prompts that reflect the specific attributes of each segment allow the model to stay closer to the game’s identity, reducing the number of corrections needed and limiting unnecessary back-and-forth between localization teams and developers.
First-pass quality improves in a meaningful way, and the post-editing effort moves closer to what it should be — focused on refining nuance and intent rather than correcting issues caused by missing or incomplete context.
This is also where structured content management delivers direct value, because a platform that already holds glossaries, translation memory, metadata, and project history can assemble this context automatically for each content unit — something that is nearly impossible to achieve consistently with spreadsheets and shared drives.
In the end, the quality of AI output is shaped by the quality of the context feeding it, which makes augmented context not an enhancement but a requirement for reliable results.
Validation is another area where many AI localization efforts begin to break down — not because teams do not care about quality but because they lack a structured way to test and confirm output before it reaches production scale.
In many setups, AI is applied directly to large batches of content without taking into account how the model might behave across different types of strings. This makes it difficult to catch issues early, and problems are often discovered only after they have already spread across multiple languages. At that point, the cost of fixing them is significantly higher than it would have been at the start.
A more reliable approach is to build controlled execution into the workflow itself, so that testing, previewing, and refining output happens as a structured step before scale is introduced, rather than as a reaction to problems discovered afterward.
Instead of relying on assumptions about prompt quality, teams can run smaller test batches and review how the model performs across real content scenarios. Based on what they observe, they can adjust prompts or context inputs in a controlled way. This creates a feedback loop that improves results before volume becomes a factor while also building a clearer understanding of how the AI behaves under production-like conditions.
Early visibility into output quality allows teams to identify inconsistencies, tone mismatches, or terminology issues while the cost of addressing them is still low. Prompts can be tested, refined, and shared across the team with a clear record of how they perform. This shifts AI adoption away from a one-shot execution model toward something more deliberate and repeatable.
The practical impact is significant. Controlled rollout reduces the risk of retranslation, prevents low-quality output from scaling across languages, and protects both timelines and budgets from avoidable rework.
In this sense, validation is not a reactive step at the end of the pipeline but an integrated part of how AI workflows are designed from the start. It ensures that quality is established before scale is introduced, rather than discovered to be missing after the fact.
The final stage is scale — not because scale is the goal but because it is the payoff that makes the earlier investment worthwhile.
A studio that has a clean workflow, enriched context, and a validated AI process is in a fundamentally different position from one that has simply connected a large language model API to a content export. The first studio can run AI across large content batches with confidence because it knows the inputs are structured, the prompts are calibrated, and the output has been tested. The second studio is operating
on hope.
Scale in localization introduces several distinct challenges. Processing hundreds of thousands of strings across multiple languages simultaneously is not just a volume problem but an orchestration problem. Different languages introduce different rules, edge cases, and failure modes. Arabic behaves differently from German, and Japanese character constraints are not the same as French text expansion. Each language requires its own calibration rather than relying on a single global prompt applied uniformly.
Live operations add another layer of complexity. A studio running a live game is not managing a one-time localization project but a continuous stream of new strings, event content, balance updates, and patch notes that need to be delivered across all supported languages quickly and consistently. Manual processes cannot keep up with that pace, and an AI system without a workflow foundation cannot maintain consistency over time. Only a connected and context-aware process can handle that volume without quality eroding update by update.
There is also the longer-term question of what accumulates over time. A mature localization pipeline builds value with every release as translation memories grow, glossaries become more refined, and style decisions are documented and reinforced. When AI is embedded within that system rather than operating beside it, each new release benefits from everything that came before. The third DLC becomes easier and more cost-efficient to localize than the first because the context is richer and the prompts are better calibrated.
Studios that expand AI too early — before the workflow and context stages are in place — often see expected cost savings disappear. Setup effort increases, post-editing overhead remains high, and quality issues force retranslation. What initially looks like an efficiency gain quickly becomes a more expensive version of the same underlying problem.
One thing is often overlooked. The conversation around AI in localization has been dominated by output quality comparisons and cost per word estimates. Those factors matter, but they are not the primary risk.
The primary risk is deploying AI into a localization process that cannot support it and discovering this only when the output has shipped to players in 17 languages.
WAVE is a maturity model, not a checklist. Studios do not move through it in a straight line, and not every team starts from the same place. Some have solid workflow foundations but have never approached context enrichment in a systematic way. Others have strong linguistic review processes but no reliable way to test AI output at scale before running a full batch. The framework helps teams identify where the gaps are and address them in the right sequence.
After 25 years of watching the game industry navigate new technology, one pattern continues to stand out: Tools may lag behind at first, but they rapidly evolve to meet changing requirements. The real constraint is whether the operational foundation is strong enough to support them. That was true when teams moved from text embedded in code and loosely managed files to structured, externalized content with string IDs, and when workflows shifted from email handoffs to cloud-based collaboration. It remains true now with AI.
The studios that gain lasting value from AI localization are not necessarily those with the largest budgets or the most advanced infrastructure. They are the ones that approach AI adoption as a workflow maturity problem rather than a simple technology swap. In my experience, that shift in perspective is what separates teams that ride the wave from those that get knocked over by it.
If there is one piece of advice I would offer to teams exploring AI localization, it is this: Focus on maturing the process first, because that is what ultimately determines whether AI can deliver production-ready results — something I have consistently observed at Gridly across different studio environments.
Michael Souto spent over a decade as a video game producer before moving into localization technology. He has worked on both sides of the table, inside development studios and alongside the tools teams use to manage multilingual content at scale. Today he is Director of Business Development at Gridly, where he works closely with game studios on the operational realities of localization.
Related Articles
Of the world’s approximately 7,000 languages, around 4,000 are written. Though many of us, certainly those of us from secure linguistic communities, take the act…
→ Continue ReadingJoin QA, Localization, Community and Support experts at Game Quality Forum Lisbon 2025 (24–26 June) to explore AI-driven innovations, inclusive gaming and industry best practices.
→ Continue ReadingBy the looks of his LinkedIn profile, Christoffer Nilsson is nothing short of a true startup success story. Even before graduating from Lund University, Sweden,…
→ Continue Reading