Improving translation of variables in interactive games

By Janaina Wittner August 16, 2011

The more a player becomes immersed in a game, the more successful the game is. Two factors determine the extent of this immersion — increasingly realistic graphics and more in-depth textual or oral interaction. This is why games require real-time interactivity with users through complex, accurate and natural-sounding language. Since it is physically impossible to plan for every linguistic combination in a language, variables are used to represent the player’s actions and environment. Localization vendors, however, are powerless in solving coding issues related to the translation of these variables. As such, distributors can face poor localization quality resulting in loss of immersion and poor market penetration. It is time for the community to develop a common standard and the next-generation shareware to tackle these issues and focus on the game.

The game localization process

Dialogues and interaction within games are complex because of non-linear storyboards and multiplayer interaction. Text presented to players is constructed dynamically from scripts for human-machine dialogue, using engines generating phrases that are dependent on context variables in real time.

Once created, these dynamic and interactive variables need to be localized for other languages. As the international gaming community expects as much quality as players from the game’s original country, localization into these languages should contain no grammatical mistakes or truncations. It should also address players according to their gender, if applicable, and accommodate local idiomatic and cultural characteristics.

In simple games with little dynamically generated text, the localization process consists of taking all user interface (UI) strings and phrases from the source language and translating them into the target language (TL) before re-integrating them in the game. It sounds quite easy.

However, this process becomes extremely complex for interactive games using variables and interactivity scripts. Localizing such games implies translating strings containing cryptic scripts from one language into another, identifying all the variables and their potential values, and translating the values of these variables into the TL. The majority of games are written in English or Asian languages, which are characterized by simple grammatical structures. This increases difficulties when translating into more complex languages where grammatical agreements vary depending on case, number and gender.

Complexity in localizing dynamic games can lead to limiting the number of countries in which the game can be marketed. The alternatives are either forcing players to use English or releasing poor quality games in localized languages, thus resulting in poor market penetration or, worse, a negative buzz. Some distributors are working around the problem by simplifying dialogues to avoid grammatical issues, therefore reducing the quality and level of immersion. Others are investing tremendous amounts of money, effort and time to resolve the issue

internally. The problem is then solved for games developed internally but not for third-party games they may decide to publish.

There is, therefore, a strong need for developers, distributors, localizers and end users to study the issues related to the localization of such interactive multilingual context-dependent games from all relevant stakeholders’ perspectives. We also need to understand the complexity and specificity of the issue, to analyze how the actors are addressing or working around the problem, and, finally, how the whole application development community can work together to define a common method of solving it.

What we are facing

Today, game translators are not handed full sentences. They work with coded bits that may not always make sense. In the absence of information, translators have little clue of the context in which phrases such as $avatar$ must find the %color %item are generated. To localize these types of sentences, the variables should be self-explanatory so that the translator can understand what the field, string or value is (Table 1). Unfortunately and more commonly, variables are highly cryptic (Table 2). Much time is then spent in discussion with the development team in order to clarify the definitions and values for each variable, and linguistic bugs are left to be fixed during the quality assurance cycle.

Even with all the information in hand, variables and their corresponding values are difficult to handle due to basic grammatical specificities such as gender, number, case or plural forms, which are sorted differently from one language to another. For example, French has four types of definite articles (le, la, l’ or les) while English has just one (the). In French, adjectives must agree with gender and number. Russian has two plural forms, while Finnish has 14 different groups into which you can sort almost every noun and adjective. So, a simple sentence in English may have many possible translations depending on context.

For example, in the original English encoded sentence $avatar$ must find the %color %item, the variable $avatar$ is any playing or non-playing character (Queenie, Mousy, Hippo, Frog), %color is one of the predefined color values in the game (blue, mauve, pink, beige) and %item is one item in the game environment (knife, tree, pumpkin, anvil, coins). The typical outputs from the original English sentence would be Queenie must find the blue tree, Alan must find the red knife or Hippo must find the orange pumpkins, for example.

In French, a direct translation of the first example would be Queenie doit trouver le bleu arbre, which would be incorrect. Correct translation: Queenie doit trouver l’arbre bleu. The definite article depends on the gender of the item (le, la) as well as its number (les) and its first letter (l’ in front of vowels). The color adjective is always after the noun, not before, as in l’arbre bleu. Any adjective must agree with gender and number, as in la fleur bleue, l’arbre bleu, les couteaux bleus, les fenêtres bleues and so on, though some adjectives are identical for either gender, as in la fleur rouge, l’arbre rouge, les couteaux rouges and les fenêtres rouges. As such, the French encoded sentence should be: $avatar$ doit trouver [DefiniteArticleValue] %item%color, where $avatar$ is the name of the characters that can be selected (Queenie, Mousy, Hippo, Frog), and [DefiniteArticleValue] is the tag displaying the correct article agreeing with both gender and number of the next word. %item is one item of the game environment (knife, tree, pumpkin, anvil, coins), but each variable value is tagged with its gender and number. %color is one of the predefined color values in the game (blue, mauve, pink, beige), and each one is also associated with its four values in agreement with gender and number.

Translating variables without any contextual information as to where it will be displayed, as in Show %Item {table}, is also a problem as some words have several meanings in other languages. For example, table can be translated as стол or таблицу in Russian. Cтол refers to an item of furniture on which we eat, while таблицу refers to a form or a list of words and numbers. Hence, Show %Item does not contain sufficient contextual information to help the translator in this particular case.

In English, too, the pronoun you does not require agreement in gender. However, in other languages, you may need to address players in a different gender for full immersion. A female player would not feel fully involved in the game if the wrong gender was systematically used when addressing her. For example, Congratulations! You are a champion could have multiple forms. The masculine form in French would be Félicitations! Tu es un champion, while the feminine form in French would be Félicitations! Tu es une championne. You have chosen the magician’s role would be Vous avez choisi le rôle de magicien in the masculine form and Vous avez choisi le rôle de magicienne in the feminine form.

In most cases, developers create an English (or Asian) game using an English-oriented engine, and publishers are responsible for funding the localization process. This makes the entire production chain particularly complex and leads to a significant increase in costs. The developer designs the engine in English, invents his or her own coding rules using $ or % and so forth, with varying levels of effectiveness, and corrects linguistic or comprehension issues at a very late stage in the process based on a bug report. The publisher is required to invest in localization, achieving average results for languages with similar grammar rules and unsatisfactory results for others. The phrases generated are either grammatically poor and fail to sound natural or, worse, are completely incorrect and incomprehensible.

The localization team is then faced with an extremely complex task, since they are expected to translate each string and re-create tables of variable values with little contextual information provided. Translator productivity is low, quality is difficult to perceive before the integration phase and the validation process is lengthened. The localization team is at a loss to solve linguistic issues in the code. Finally, players are required to either interact in English or tolerate sentences such as “You need [bullet] X 3 to do that!”

The only way around these issues today is to plan the localization phase upstream during the design phase and use an encoded method of model phrases regardless of language. This is highly complex, very costly, quite often inefficient and almost never standardized.

The solution is for all interactive context-dependent game stakeholders to seek a standardized way to handle the problem and then develop the related localization tools and environment.

The need for joint effort throughout the community

There is a growing need for actions and innovative solutions for the whole game localization community. Several actors have already taken the initiative of building a core group, open to all stakeholders, whose actions include a technical study of the situation in the game industry and beyond, the definition of a standard, the development of localization tools as well as the launching of innovative joint research and development projects. The objectives of this initiative are to structure the game community — companies, researchers, universities and experts — to support a task force for sharing information and establishing a standard, to produce the tools and the appropriate environment, and to create and obtain funding for more collaborative projects.

This initiative will provide important financial, technical and scientific benefits to all stakeholders. It will have an impact on the technical and financial aspects of games by introducing the idea of localization and multilingualism at a very early stage in the development process and by facilitating work for developers with a commonly adopted standard. It will help simplify techniques in the development environment and save a huge amount of time — the timescale for marketing the product is vital in this industry — and money — no more compromises between the time needed for translation and quality. It will also ensure the deployment of 100% linguistically accurate games from the design phase onward, with immediate acceptance from the players’ community without any negative buzz due to linguistic issues polluting the entertainment experience. In this way, it increases value for the developer and expands the publisher’s potential market with a rollout in several countries without costly localization investment or trade-offs. Lastly, it will endeavor to provide ingenious and effective solutions that enable a combination of human and machine translations with the objective of achieving quality for customers and users alike.

Last summer, under the initiative of the first core group, this idea was presented to the European Telecom Standardization Institute (ETSI) and its “human factors” technical committee, which focuses on all highly relevant non-technical issues, including user-application interactivity and language-related topics. The committee officially endorsed the initiative in October and launched a technical study aimed at analyzing the situation and defining future solutions for all “context-dependent multilingual communications for interactive applications.”

The ultimate objective of the task is to define one standard for each group of languages to be used by developers and publishers. This would ensure that phrases generated are accurate in the TL of each standard, as grammatical issues associated with a particular language would be coded in a uniform manner.

This ongoing task at ETSI is open to collaboration by any stakeholder. It involves identifying the technical and scientific elements required and assessing whether they might already exist in other research or development projects. It should also identify what can be integrated, adapted or developed.

The official technical report, expected for this fall, should provide a full set of state-of-the-art recommendations for using existing elements, list missing components in the environment and launch the definition of the standard.

This initiative seeks to define a localization environment that would provide translator support in localizing interactivity scripts and the associated context-dependent variables, along with a verification environment and editing tool available to developers, publishers and translators for post-editing and validating the standard in development and localization processes. The objective is to provide a complete set of tools offering a comprehensive environment and to simplify the design of multilingual game interactivity through a common standard.

One crucial aspect of the initiative is that, once the research has been done and the relevant environment developed, the results will be offered to all of the stakeholders within the community. Stakeholders in turn should help provide these tools and environment as shareware to improve interoperability.