Machine translation (MT) has been a hot topic for a while, especially since the rise of neural machine translation (NMT). The output quality spike is finally shaping solutions to deal with ever-increasing volume and accelerated demand, remarkable even for the prudent gaming industry. Electronic Arts was an early adopter and gained some insights along the way.
The great spike in MT output quality can be dated back to 2016 when Google launched its neural system (GNMT). This technology has come a long way since the introduction of clumsy rule-based MT or even the improved statistical MT dominating the scene from the 1980s. Google was a pioneer providing easy access to NMT back in 2016, claiming that in some cases, human and GNMT translations were nearly indistinguishable.
MT technology has been available to the game localization industry on PCs for over 25 years without much success. Any technology that affects the traditional role of translators is not adopted easily, and MT was for a long time regarded as a standalone solution that would replace translators’ roles, instead of a way to assist them with work.
Despite this, with some tweaking and research, MT has proven to work well in many areas: online catalogs, customer-created content and “gisting,” where understanding the general meaning is the main goal. For video games, however, we need to account for gamers’ expectations — which are rightfully quite high.
Localization is an essential part of the gaming experience for most players around the world, as it enables them not only to understand the mechanics of the game and its rules, but allows them to especially enjoy the gameplay and feel engaged. In other words, quality localization enhances playability beyond mere functionality. That is why the gaming industry is still very cautious when it comes to MT implementation.
Apart from immersive quality, in-game texts present some specific challenges which are quite unique.
Terminology. Probably the biggest challenge is terminology, a crucial part of a successful localization effort. Consistency in terminology is fundamental not only to ensure a good gaming experience but also to prevent noncompliance issues that might hinder the release of the game.
Variables and tags. Another important technical challenge is the presence of linguistic variables and tags (Figures 1 and 2). These need to be respected in the translated text, as they will be replaced by the player name (for instance) or by a link to a screen in the game itself. Sometimes they are just cosmetic tags to modify the text format. Mistakes could result in code errors that would provoke functionality and display issues, disrupting playability.
Creativity. Not only do we have technical components to keep in mind, but one of the biggest challenges with NMT in the gaming industry is creativity. The types of texts can vary a lot from conventional on-screen text, and due to this, the required level of creativity changes. Apart from audio recordings, which by nature need to be quite liberal and natural-sounding, we often come across made-up language or puns and jokes that need to be transferred to the target language. So how do you choose whether to apply NMT for a specific game, and if so, where do you start?
MT implementation for video games
Despite these challenges, MT can play a big role in the game localization process. Here is a step by step guide on how to get started.
1. What is the reason behind MT implementation?
The first question that needs to be addressed is the reason behind the desire to implement MT for video games and what you want to achieve. The answers could be numerous, and most of the time they are related to ever-increasing volumes and accelerated translation timelines.
Speed may be one factor. MT might help satisfying the need of being more agile and increasing the speed of delivery, especially considering the continuous delivery approach currently applied throughout the industry.
Utility may be another. MT might be needed if only to understand the general meaning of specific documentation for internal purposes, to avoid dedicating precious resources and time to this task.
Cost could be another factor. MT might save money.
Each video game company might have a specific reason connected to these criteria, but nevertheless there is one common key factor: fit for purpose. If the quality of the raw output is not good enough, the translations won’t be delivered faster or cheaper, and they will not even be understandable. This is why the first thing to consider with NMT is quality, and we will come back to this later.
2. What kind of text?
One important thing to keep in mind is that video game localization does not consist of just the in-game, on-screen text. The biggest chunk of content may come from the in-game text itself, but MT might be applied to other kinds of content. These can be customer support texts, how-to articles, metadata, marketing text, game packs and so on. This means that the purpose and readership of the content (and therefore the level of quality you want to achieve) are the primary considerations to keep in mind.
3. How do you measure success?
Success can be measured in different ways, particularly depending on the reasons connected to the implementation of MT.
Not only is the quality of the output a measurement of success, but there are other things to consider. It is important to measure how much of the text delivered by MT is being edited by post-editors, while also taking into account time-to-market acceleration and productivity increases.
The variety of text types presents different challenges and quality expectations. This means that when assessing quality, it is fundamental to keep in mind the purpose and readership of the content and therefore tailor the relevant quality evaluation method. Readability or accuracy might not be sufficient for a game text.
The most commonly used and accepted mathematical MT output quality evaluation method is the BLEU score, a metric that scores translations on a scale of 0 to 1. The closer to 1, the better the translation correlates to a human translation. Put simply, it measures how many words overlap in each translation when compared to a reference translation.
Nowadays there are tools on the market that help with tracking these variables, as well as resources like the TAUS DQF BI Report to benchmark the results against other parties.
4. How do you choose an MT provider?
Once the decision to implement MT for games is taken, the next step is the provider selection, key to the success of this venture.
There are many providers on the market to choose from, depending on whether you prefer to build your own system or if you want an external partner to do it, with the associated costs and time factors.
The quality of the output is an obvious selection criterion. However, in the video game industry, the security of data, especially for unreleased titles, might play an equally important role.
Whether the MT system is built internally or by a service provider, in the case of video games it is rational to use the relevant company’s game translation memories (TMs) to create a model able to produce translations close to the desired style and quality. These TMs will be stored on a cloud during training, and it is imperative that they are not used to either train another company’s systems, or a general-purpose engine. The training corpus should be stored in a secure environment inaccessible by anyone outside the specific company.
5. How will MT integrate with your workflow and tools?
No matter what partner is selected as the MT provider, the chosen solution will need to be integrated within the current translation workflow. All affected stakeholders should be included, and the translation environment should be prepared to best accommodate this new technology, as MT should automate the workflow even further and not create additional overhead.
MT is not a standalone solution; it should integrate with the current translation process. This is not a new concept. Augmented translation puts technology and AI in the service of translators. We recommend combining TM output with MT output where the leverage is lower. As an example, why use MT on a segment that already has a 90% match from the TM?
Not only do the technical aspects need to be considered, but you will need to train your teams. They should be able to best take advantage of the new technology with some post-editing courses, practical exercises and a general understanding of this technology. It will not only allow them to perform correctly, but may also help them change their mindset toward MT, which might be the biggest challenge to face.
Case study: MT at Electronic Arts
Electronic Arts owns a large portfolio of titles and its localization need grows constantly. With millions of words needing yearly translation, the localization team strives to provide the best quality in-game translations while being efficient both in terms of speed and cost. MT seemed to be the perfect solution to support these business needs while keeping pace with the technological evolution happening in the industry. Electronic Arts had been investigating MT for several years before we decided to finally implement it in April 2019. We wanted to be certain we had all needed elements and knowledge to face this change, as it would be a revolution not only in the workflow, but especially in the mindset of the team.
We started by categorizing all our text types in order to understand which ones would be more suitable for MT and, based on that, which level of quality we could expect to achieve for each of them. We identified eight main text types based on the gaming experience they inhabited: player feedback, customer support, back translation (from a non-English source), game content, websites, tutorial/user guides, live chat and translation for information.
Then we applied three main criteria to establish the potential for MT implementation. These were utility of the content, speed delivery and sentiment, which considers the emotional engagement of the player (Figure 3).
Once we identified the text, we started looking into different providers with the following selection criteria in mind:
• Customizability of the engine. We wanted a provider that enabled us to internally control the customization of the engine. Once we could do this, we would be completely autonomous. Our own deep knowledge of our TMs gave us a great advantage in understanding how to best use them for the type of training needed to build an MT model.
• Connectivity with computer assisted translation (CAT) tool. The aim of MT implementation at Electronic Arts was to simplify and automate the workflow further. Thus, one important criterion while selecting the provider was the seamless connection with our current CAT through an API, avoiding the creation of additional steps that would complicate the workflow rather than simplify it.
• Quality of the raw output. Without a doubt, this was important, so before choosing the provider, we made certain to run enough tests and to benchmark the quality levels we wanted to achieve for each type of text. Our aim was an output quality needing the least number of edits possible, as we don’t publish anything without post-editing.
• Cost. MT is neither free nor cheap, and its implementation requires a costly investment in resources and time. A potential cost reduction in production and controlled system maintenance costs opens the door to increasing the scope of localization — for example, by increasing language pairs.
Building language models
Once we chose the provider, we created several language models. Our first approach was to identify one project we wanted to apply MT for. After careful analysis, we decided not to limit the implementation to a single project, but to pick a variety of texts, as risky as it would be, to be able to understand which challenges we would face and overcome them. That’s why we decided to pick about 24 projects, with a variance of customer support text, marketing text, internal documentation and in-game text.
Our provider only allowed us to build a language model on top of an existing one, instead of creating one from scratch. Since the biggest volume of translations of the content we wanted to apply MT for is represented by customer support text, and this contains a variety of games, we decided to first create engines which we called “generic,” meaning trained with almost all our TMs, properly prepared and cleaned according to specific criteria. This can present many challenges, as these kinds of engines can perform very well with generic texts that don’t contain an important amount of IP-specific terminology, but they do not perform well with games in which terminology plays a very important role.
With this in mind, we started testing texts selected from a mix of projects to assess the quality of the output and identify potential risks of MT for each type of project.
We decided to combine two approaches in our output quality evaluation. On top of the already-mentioned BLEU evaluation method, which we control by carefully selecting the reference to compare the MT output to, we added a human qualitative style and linguistic evaluation that consists in giving a score from 1 to 5 in terms of fluency and adequacy.
The linguist assessed the grammaticality of the output, without collocation errors, style pitfalls or unnatural language; that is to say, how much of the meaning and emotion expressed in the source text is present in each of the target language translations and if they are grammatically correct.
We picked a text for each project and language we wanted to apply MT for (24 projects in total and 27 languages) and had it evaluated by both internal and external linguists to obtain an averaged and less biased score.
Not all the languages evaluated reached the quality output we were hoping for, but we decided to implement them anyway during our trials. By doing so, we would get more data to be able to improve the quality, thanks to the feedback coming from post-editors.
While we risked creating player dissatisfaction, especially for the low-quality languages, we decided to take a first step toward a big change. Also, and thanks to the good relationship we have with both our internal and external partners, we achieved functional quality and gathered some very valuable feedback that helped us improve our MT processes and output quality in all languages.
This experience helped us readjust our MT rollout strategy and define a plan for the following months, to improve the quality even further as it is a very long and delicate process.
After the implementation, we worked on analyzing the feedback systematically. We are now able to leverage it not only to improve the quality of some specific languages, but to understand what the next approach for our MT strategy could be.
This allowed us to categorize mistakes according to what can be fixed by training the engine, like mistranslation, grammar issues, terminology and everything connected to language and style. We also looked at what can be fixed by improving the CAT integration. For instance, this would be tags and variables issues, glossary violations and spacing issues.
Moreover, what we are considering and testing out now is creating franchise- and game-customized engines, targeting increased stylistic and tone accuracy in the output text and refining the process of cleaning our data for training.
One important concept we kept in mind when we started planning the implantation of MT for our games is that this is not meant in any way to replace human translators, as the current status of MT doesn’t allow the publication of the text without human intervention, especially for in-game text. That is why the workflow we have in place now includes a combination of TMs and MT, followed by post-editing and a linguistic quality assurance review by our localization testers.
Recapping and looking forward
The implementation of MT can allow game localization publishers to accelerate time-to-market and to increase the pool of languages offered, creating a new competitive edge.
MT implementation in games is a long and iterative journey, and it is important to bear this in mind. Game text is very peculiar and can vary a lot from IP to IP, making it complicated to find one perfect engine and provider able to service all genres and all languages. We encourage game localization teams to invest now in testing available options until reaching the expected results.
One of the main barriers for the implementation of MT in the gaming industry today is translators’ pushback and the lack of available trained professionals. A good post-editing team is essential to reaching the cost and time margins pursued by this technology, and a deep understanding of sensible quality, cost and time expectations are key to managing stakeholders. Solid and unbiased metrics and processes will be your allies in this journey, and fortunately the localization industry has made great progress here already.