A quest for quality in video game localization

By Demid Tishin May 20, 2014

It was a chance we could not miss. Up until 2011 my company was essentially a single language vendor, translating a wide range of content from or into Russian, with a focus on huge Chinese and Korean online role-playing games for Russian players.

Suddenly it all changed — mobile and casual games boomed, while the massively multiplayer online role playing game market got saturated and stabilized. Hundreds of new mobile developers and publishers appeared, often with little expertise in internationalization and no staff to manage localization in-house, let alone to manage each target language separately. The new client typically wanted only one or two providers to handle a whole line of game titles as well as regular updates and marketing phrases. Most of them wanted this done simultaneously into ten or more languages, for players in the Americas, Europe and Asia. For All Correct it was an opportunity, but it was also a challenge, as we had only a vague idea how to manage quality in foreign language combinations.

For one client, we translated 25 mobile titles in 2011, some of them into 14 languages. This required a total of 89 translators, all of whom we found in online databases. Initially, the vendor managers had no testing procedure and applied a few basic filters: the translator must translate only into his or her mother tongue, reside in the country of the target language and have a positive public track record, established by ProZ.com Willingness to Work Again rating, for example. The workflow was simplistic, too — the project manager selected a translator based on a video game portfolio, the translator delivered an Excel file, the project manager did a quick formal review and delivered the software strings to the customer. In-app localization testing was done on the client side. As a result, not only our quality control, but also the client’s internationalization process was immature — most localization kits didn’t provide any context for software strings — even though many titles were hidden object games, where a difference between a (long)bow and a bow(tie) made a critical difference.

In other words, we had it coming. In October 2011, the publisher received negative feedback from the gaming community on some of our German and French localizations. Complaints on Chinese and Brazilian Portuguese followed in 2012. There was a whole range of errors — language, style, consistency and accuracy. A few times players referred to localization as raw machine translation output. We admitted the fault and started on a solution.

Checks and balances

First of all, we put a question mark on every translator, dramatically expanded the translation team and totally re-translated all problematic content.

To filter out unprofessional freelancers we launched a massive cross-check campaign. We devised a competence assessment form (Figure 1) and organized peer checks by two or three translators for every translated chunk of content. The form allowed us to assess six translation competencies separately (subject matter expertise, understanding of the source language, proficiency in target language, style or literary competence, regional standards and compliance with instructions and procedures) on a 1 to 5 scale and provide error examples and overall recommendations. Each form was analyzed by the project manager or vendor manager for validity and then imported into our vendor database.

As a result, only 24 people from the 2011 team (27%) remained in 2013, while 73% of the original team were discarded as unprofessional. One of the freelancers who didn’t pass the test was someone supposedly named Stefan Jacob (a respected German translator, according to his CV), who finally was pinned down as an impostor with the real name Heba Qudaih, living a thousand miles from Germany.

Ten German, ten French, 17 Chinese and 25 Brazilian Portuguese localizations were redelivered and fixed with urgent localization patches. This stabilized the situation with quality claims and stopped further damage to the publisher’s image. Fortunately, no “machine translation” complaints have been received since then.

Secondly, we introduced a testing procedure for new freelancers. To achieve a satisfactory level of reliability for competence assessment, and for assessment results to be more or less reproducible, we prepared concise guidelines, and organized and recorded a webinar for all quality reviewers.

Now, every translator candidate is given a test job that is checked by a regular translation team member and a competence assessment form is completed and checked for validity by the project manager. Some red flags for invalid assessment include competence marks that are lower than 5 but no error examples are given, or the target language competence is rated low, while in fact only style issues have been detected.

Next, the vendor is given a pilot (real) job, which is also checked, and this time not only a competence assessment form, but also a quality check form is filled out (Figure 2). Both forms are also checked by the project manager for validity. If no major errors are detected in the translation job, the candidate becomes a regular team member, but competence assessment does not end here — every fifth job of 1000 words or more is peer-reviewed, a competence assessment form is filed in the database and the freelancer’s current rating is updated. In this way, 170 new translators and quality reviewers were added to the team by 2014.

The vendor managers keep an eye on the grades and inform the project manager in the event of any significant grade changes for familiar translators. The head of production checks on a monthly basis if any project manager has assigned a job to translators with low grades, and follows up accordingly. To reward talent and have more options, the project managers give every hundredth translation job to a fresh translator who has only passed an entrance test.

Thirdly, we overhauled the localization workflow. Before assigning a job to an unfamiliar translator, the project manager checks to make sure that his or her target language and subject matter grades are 4 and higher. All gaming translations are now done in a server-based translation environment, which greatly increases terminological and stylistic consistency. Besides translation proper, every translator submits a glossary update, which is peer-checked according to a special checklist before being added to the main project glossary. This also lowers the chance of terminology defects.

Every translation job of 1500 words and more is submitted to a peer translator for quality check, and the quality check form is reviewed by the project manager. If the project manager suspects invalid assessment, or if the quality reviewer has a short track record, a secondary check is performed. Ideally, quality review is performed before the localization kit is submitted to the publisher for integration with the build. If the deadline is tight, a quality check can be performed after translation delivery (but before the game build is compiled) or in rare cases even after the first round of in-app localization testing, so that any corrections will be tested during regression testing iterations. This minimizes the risk of language, stylistic and accuracy defects. There is no standard sample size for quality review, but a time slot from one to four hours is allocated for quality checks instead.

Error categories used in the localization quality check form strictly correspond to the QT LaunchPad MQM (Multidimensional Quality Metrics) Version 2, a European Commission-funded initiative (for details see www.qt21.eu/launchpad/content/delivered).

Small translations are checked when their total wordcount for a single translator adds up to 1500 words. After the client compiles the game build, our team of localization testers play the game and submit all localization-related bugs into a bug tracking system such as JIRA or Redmine. At this stage we detect truncated strings; untranslated source text that was added into the game at later development stages and had not been included into the localization kit; compilation errors such as chunks of alien language as in Figure 3; and word concord problems when translated strings have been put into context. Most of the bugs are fixed by the translation team, the corrected strings are resubmitted and a regression testing iteration follows, until there are no defects.

Finally, we worked with our customers to improve their in-house software internationalization process. In 2013, a localization kit without images became an extraordinary thing, compared to the 2011 practice of not having any. Software builds are often provided for our team to get immersed in the game before working on translation, and localization kits are often provided in a logically structured order. Shared question and answer sheets are actively used by the development, translation and testing teams.

Some outcomes

As a result of our quality assurance efforts, the relative number of valid quality claims and translation withdrawals per video game localization revenue dropped by 51% in 2013 compared to 2012. The relative number of translation accuracy errors dropped from 61% in 2012 to 29% in 2013, and typos dropped from 11% to 4% respectively. Also, root cause analysis of localization defects showed a decrease of errors due to poor workflow planning or vendor management — from 20% in 2012 to 5% in 2013.

As for the client we mentioned previously, in 2013 we expanded our partnership, having translated content of 54 video game titles into as many as 28 target languages, compared to 25 titles and 14 languages in 2011.

Even though our multilanguage video game localization has seen a dramatic improvement in quality over the last three years, still there are challenges to meet. We have organized a few webinars for our translation teams on MemoQ tips, terminology work and translator competence assessment, but there are many other skills to develop, and some 36 training activities are planned for 2014.