Machine Translation for Games
An interview with Mikhail Gorbunov
Yulia Akhulkova is a data scientist at Nimdzi, and graduated from Moscow University of Electronic Engineering as a software engineer. Since 2010 she has worked in localization, combining strategic, control, and marketing functions.
Russia-based Social Quantum is ranked among the top ten biggest game publishers in Russia and Eastern Europe. The mobile game developer has created such titles as Megapolis, Wild West: New Frontier, Dragon’s World, Ice Age World, and Poker Jet, and is in the process of making a few more new ones. The company has also been conducting successful experiments in the area of machine translation (MT) and machine translation post-editing (MTPE). Mikhail Gorbunov, head of localization at Social Quantum, shared his insight on the company’s MT initiative.
01 | What languages did you have in your MT initiative? Also, have you added new ones since you launched the MT?
We’ve always had 11 target languages: English, French, Italian, German, Spanish, Portuguese, Simplified Chinese, Traditional Chinese, Japanese, Korean, and Turkish. And we did not add new ones — we rarely add locales at all, preferring to translate projects into all languages simultaneously. We started experimenting with MT in August 2019, after talking with the guys from EA and Supercell. At first, it was a question of competition, like, if they can do it, why can’t we? In October 2019, the Russian to English pair on our existing projects completely switched to MTPE, with the exception of marketing materials. For those, transcreation is needed rather than translation. The rest of the languages switched to MTPE in February 2020. Before that, we played with all the MT engines we could reach. Now we are experimenting with trainable engines, but the result leaves much to be desired. It’s also worth mentioning that we translate games where quality depends on the translator’s qualification; the linguist’s desire to work on the project; adequate turnaround time requirements; and the quality of the original text. All these points are primarily tied to us, the localization managers.
Table 1: Translator estimates on time savings was as high as 30%, but varied by language and was difficult to measure.
02 | So you compared the MT engines on the principle of time spent on post-editing? Did you track how much time did you actually save?
Yes, we compared the engines based on this principle. But we also used a “like/dislike” approach when comparing them as well, as a “yes, I do want to work with this engine” kind of metric. After all the wanderings, the team chose DeepL, mostly motivated by the fact that it had the least amount of post-editing required.
It’s worth mentioning here that we have a team of perfectionists. If the translator doesn’t like the phrase even slightly, he deletes it and writes the translation from scratch, performing human translation instead of post-editing. With DeepL, the number of segments rewritten from scratch is two to three times less than with any other MT engine that we have tried. But this does not mean that the general number of rewritten segments is small. You should not perceive MT as a solution to all your problems, or DeepL as the king of MT engines suitable for any purpose.
“What is also worth mentioning: we do not use MT on our new projects or non-standard tasks. We use MT only where a TM, a glossary, and a styleguide are available, and the linguists already know the project.”
03 | So exactly how cool was DeepL? Was its output 5% or 50% better as compared with other MT engines?
One way to measure this is to ask the translators. I collected translators’ feedback about our current MT experience, keeping their original wording. The results are in Table 1, although they’re all talking about time savings compared to translating from scratch.
The gained increase in translation speed is difficult to measure objectively for many reasons.
Source texts have different content and complexity; fuzzy matches from the translation memory (TM) may be different. Also, things like a Mercury retrograde period or a linguist’s bad mood might change the outcome. So let’s just say, the speed gain ranged between 5% to 50% depending on these and other factors.
We actually have not tested the engines in a way that Intento does, for example. What we did was MTPE of our current games updates (10-1500 source words per iteration) and then analyzed of the results together with a team of translators and editors.
I really don’t like to complicate processes, especially if there is no clear goal behind it. In this case, as there was no ultimate goal to have a set of MT evaluation criteria and strict metrics for academic or commercial research purposes, why would we do this? So we didn’t.
I was given a simple task: to reduce the cost with minimal loss in quality, while the timing was of secondary importance. The most obvious solution seemed to cut out editing. So we had had a full translation, editing, and proofreading (TEP) cycle, with at least two translators working in each language pair. The problem was that without the editing step, the quality dropped noticeably: typos and misspellings were not corrected, and actual errors became more frequent.
Let me stop here for a moment and tell you more about how we actually measure the translation quality: there are major errors, and there are minor errors. Major ones either contain a major factual, political, or cultural error, or do not allow the player to perform the required action and correctly understand the essence of the game task. Such errors are subject to immediate correction, as they affect the monetization and general image of the project. And there are minor errors, which encompass everything that has to do with the grammar rules but will not drastically affect the gameplay or generate negative feedback from the users. These errors are being fixed in a general order.
In general, we adhere to an industry-wide rule for TEP: a maximum of one major error and three acceptable (minor) errors per 1200 words of the source text, excluding tags and placeholders — provided that the linguist made these errors without the influence of external factors, meaning they had the needed context or at least time for clarifying questions.
So here comes what I like about MTPE: an acceptable MT is easier for a translator to edit rather than to come up with another version from scratch. The post-editors get less tired and make fewer mistakes, since they only need to assess how correctly the meaning is conveyed, and then adjust the wording if necessary. In this scenario, the number of both unacceptable and acceptable errors was 50% less as compared to the scenario when a translator does a translation from scratch and without further editing.
What is also worth mentioning: we do not use MT on our new projects or non-standard tasks. We use MT only where a TM, a glossary, and a styleguide are available, and the linguists already know the project.
“We started by discussing with each translator individually their view on MT and gave them time to adjust, to get used to it, to conduct their own research of engines on the market. Some even came with suggestions to try this or that engine!”
04 | So the whole process is currently “MT output + 1 translator”? or is there also a separate editor?And by the way, what is the discount on MT (if you can disclose it) for translators?
Yes, it’s now just MT + 1 post-editor. But we also perform an additional editing step as an exception for quality monitoring, if any doubts occur. And yes, we have implemented discounts, but they depend solely on the translator. For some it is a 30% reduction of the translation rate, while some others are not ready to take a discount at all. In the beginning, we asked our future post-editor two simple questions:
- Are you ready to perform MTPE instead of translation on our projects?
- If yes, at what rate are you willing to be remunerated for this job?
And we did not demand immediate answers to these questions: we gave our linguists several months to “play” with the MT engines. The bottom line here is that linguists themselves need to want to work with MT. You think we spent six months solely on testing the MT engines? Mostly, we spent this time on negotiations with the team, searching for compromises, agreeing on the rates, discounts, establishing processes, and some internal motivation.
We started by discussing with each translator individually their view on MT and gave them time to adjust, to get used to it, to conduct their own research of engines on the market. Some even came with suggestions to try this or that engine! In parallel, we were looking for MTPE specialists on LinkedIn. We then compared the post-editing done by our team with the post-editing by these specialists. Our team performed much better, because they are perfectionists and know their projects very well. As it happens, the so-called MTPE specialists just slightly polished up the MT output so that it did hurt the eye. Also, many of these “specialists” offer dynamic rates: for 5-7 cents you will have a decent text, very similar to a human translation, and for 3-4 cents you only get light post-editing. In the second case, this post-editing looks exactly like post-editing — that is, the MT stays very visible behind some tiny edits.
05 | Sounds familiar! Earlier you mentioned custom MT engines. Is there any progress there?
Indeed, we have successfully completed an unsuccessful test of one very famous engine. I will not name it here so as not to do an anti-advertising, which the development team of this engine does not deserve. They helped us a lot and actually gave us the engine to play with for free. The effect of customization is simple: the MT output presents a hodgepodge of the engine’s “intelligence” and TM based on which it was trained. Unfortunately, initially the engine is not very smart, so as a result we get a poor MT with bits of phrases and terms from our projects.
Does it work? It does. But is it really usable? Well, not yet, in our case, since the wrong segments still have to be rewritten from scratch. And some nice trick customization does do not radically improve the situation.
To sum it all up:
- We perform MTPE on our older projects. We do not use it in marketing materials and new games.
- We do not use MT in projects with lots of phraseologisms, wordplay, and references to other works.
- The translation team is the same. Some of the translators left after the launch of MT initiative, but most of them stayed to work with us.
- Linguists have been already familiar with MT concepts and ways.
- As compared to TEP (for old projects), cost savings are significant.
- Custom MT engines can work, but they are of little use if the engine does not work well without training.