Can MT play the game?

By Jie Jiang and Antonio Tejada May 14, 2015

The European Union (EU) funded Online Service for Subtitling by Machine Translation (SUMAT) project, which came to a close at the end of 2014, explored the long-term feasibility of using machine translation (MT) for the purpose of translating subtitle files. Although originally not created specifically for games, there was clearly an opportunity to use these topic-specific engines to evaluate the possible impact of MT within the subtitling of games, as part of the process to streamline localization efforts.

As these technologies developed and became available for use and experimentation, their application in other areas came under discussion. Gaming is often seen as the poor relation to traditional mediums such as television and cinema, but the industry is quickly outgrowing its humble roots both financially and in terms of the sophisticated technology at its disposal. Facebook’s acquisition of Oculus VR for $2 billion last year is one example of how gaming technology is driving a new funding spree.

When samples were placed through the SUMAT system it became apparent that it might be worth applying the results of this project to the subtitling tasks that some game developers regularly face in the production of multilingual games. This would clearly be an ambitious undertaking, requiring a significant level of adaptation to a language style that tends to be full of slang and colloquial phrases — and a language style that can change from game to game.

Although there are significant doubts about the application of this type of technology outside of areas that have very structured content, where language is quite repetitive and predictable in its sentence structure and use, how we might move from the typical MT environment of written language to oral language applications such as subtitling holds some fascinating potential.

Customizing engines and corpus

If we take subtitling as a special case in oral languages, we do obtain significant benefits from MT. Generally speaking, subtitles are considerably easier to handle through machines. This is partly because there is more reliable information available on subtitle files, such as domains, text genre or even the producers of the subtitles, which can be used by machines for better modeling. In addition, the irregularity of subtitles is much less than other oral languages such as spontaneous speech (live commentary, for example), where being able to predict the ongoing text is extremely difficult.

The SUMAT project collected vast amounts of parallel (about seven million subtitles) and monolingual data (about 40 million subtitles) for modeling. More importantly, the quality of the data had been assured by professional subtitlers who guaranteed the quality of MT output and also differentiated SUMAT services from other “free” corpus options. In addition, the detailed information on subtitle sources, domains and text genres was organized specifically to take into account slang and colloquial languages.

Studies on the post-editing production rates showed that up to 30% improvement was achieved by simply deploying SUMAT MT engines in the subtitling workflow. The quality of MT output is undoubtedly good enough to be of benefit to the subtitling translation companies. Therefore, if we accept that there are similarities between subtitles and gaming content such as dialogues, SUMAT technology can be tested for this type of new application.

Although subtitles follow a more specific structure and could be described as written communication rather than oral, we can still foresee a significant level of difficulty when applying this idea to gaming dialogue. As we know, the purpose of subtitles is to deliver and convey the original information of the soundtrack to the audiences with its distinctive colloquial, time-and-space-limited and culture-specific features. It seems difficult to expect MT output to automatically identify these three key factors and then deliver a suitable translation.

Peter Newmark, a famous translation theorist, has introduced functional grammar and cross-cultural communicative theory into his studies of translation theories, and proposed that both are not contradictory. This means that the proper translation strategy of a specific text should be determined on the basis of text typology, purpose of translation, the intention of the author and the target readership.

Challenges

With machine translated text, there are many known issues that even state-of-the-art MT systems cannot cope with, but we are not relying solely on the MT output for quality. In the SUMAT project, the aim of MT is not to replace human translators, but to improve their productivity.

In production we don’t particularly criticize MT output on very difficult input sentences that are meant to be looked at by post-editors who will use their expertise to produce more suitable translations. This scenario makes more sense with the deployment of an automatic quality estimation module, which behaves similar to fuzzy-match scores from translation memories, so that it helps post-editors to estimate the effort that they have to put into the post-editing process to produce a translated segment of publishable quality. Therefore they can make decisions on whether to discard poor MT output and start translating from scratch. In this case, taking MT as a useful pre-translate tool for post-editors means that a better productivity gain can be accomplished by reusing MT output in the post-editing stage. In return we can achieve lower costs and a shorter turnaround time. This could certainly be applied to game localization.

On the other hand, challenges in the translation industry do impact the MT research trend. Many research topics have emerged from the actual problems that were encountered in practice, and a lot of this research has been effective in improving our understanding. At the beginning of the SUMAT project, the consortium involved identified many problems that could arise from using MT for subtitling and set up specific work packages to deal with them, such as building different types of statistical MT systems to tackle the challenges. However, instead of looking at each MT output for detailed comparison from a translation theory point of view, we are more interested in productivity gain. Thus all of the built systems were compared statistically and the best ones were picked out to facilitate the translation workflow. Following the SUMAT research project we are quite certain that the quality of MT output has been well-tuned to achieve the best possible outcomes.

From a scientific point of view this is great, but what about possible return on investment? Game developers are looking at using new methods to achieve better efficiencies when translating in-game dialogues (not just as a “nice to have” feature, but also as an aid for those users with hearing problems). According to last year’s SUMAT report, up to 56% of the MT output was ranked as having very little or no post-editing requirements. Although very encouraging, this is arguably not sufficient enough to justify the risk of investment, and straight use of the MT content will require some level of post-editing. As a game publisher faces a decision on whether to use a similar tool for their subtitling, it is crucial to understand what investment is required to make MT output “game ready.”

Post-editor feedback

The feedback study on MT output has always been interesting. It tends to be a mixture of both objective errors and subjective ideas. However, in most of the cases, comments like “this is useless” are not very constructive. Therefore, to provide more meaningful feedback on MT output, a specific evaluation work package was carefully designed as part of the SUMAT project, so that both MT developers and users could benefit from valuable findings. Specifically, nine types of errors were identified and collected in the evaluation stage, including semantic, syntactic and formatting errors, along with subtitle-specific issues such as “the contents are too long.” The results showed that the main type of errors were mistranslations, followed by missing content and grammatical agreement inaccuracies.

These are typical errors that MT is generally blamed for, so in practice, reviewing processes is essential, even on an MT system that produces translations with a very high level of reusability. On a brighter note, there are also many subjective comments that favor the SUMAT system. The feedback from the translators is that the post-editing process is noticeably different from the translation process, but once you get used to it, it can be amazing!

The post-editor experience is generally a non-linear function to the MT engine quality. Once the engine quality drops under a certain level, post-editing MT output becomes a burden. Therefore, it is more of a priority to consider the best way to use the corpus resources that are available in the gaming industry to improve MT engine quality to a strong level, before looking at specific errors to clean the outputs. Once the MT quality reaches that point, the rest of the work will be much more straightforward to deal with and develop.

The future

We are faced with a market that is providing increasingly sophisticated gaming experiences for its users. As a result there is a continuous rise in available published content, and the rise of specific interactions with the player also adds additional challenges for game developers. We are now searching for a solution to how games can maintain and improve their international accessibility without having to rely on English being the lingua franca in the gaming experience.

It is easy to recognize the value of the work carried out by SUMAT. Although designed for the purpose of traditional film and television subtitling, the “topic-specific” engines can be adapted to provide a good platform for MT to help overcome language challenges in the gaming medium as well.

However, any application will require careful collaboration between the publisher and the technology owner. Engines can be trained to the language of a specific game in order to guarantee a level of output quality that could be beneficial to developers.