Character corruption in video games

Character corruption has always been a major issue in video game localization. So far, the video game industry has taken a mostly passive approach to solving this problem. Instead of proactively tackling the issues at the source and eradicating them completely, most companies rely on testers to flag individual issues as they are encountered during play tests.

This reliance on human testing makes for an extremely error-prone and risky approach, especially considering the sheer number of string updates most games go through with each build iteration; the number of languages video games are localized into nowadays; and the difficulty of triggering every single game string through a normal playthrough.

What makes these font issues so problematic is that they are likely to be flagged by first party publishers (Microsoft, Sony or Nintendo) as must-fix issues, which means submissions could fail if the submitted builds contain corrupted font issues.

However, there are ways to ensure character corruption can indeed become a thing of the past.

How font support works in video games

Although every game may approach font textures and character support in slightly different ways, most games I have localized relied on font textures generated from font files. Font textures are basically sets of glyphs the game calls up by using a character’s hexadecimal Unicode value. This set of glyphs is normally generated by embedding the list of characters we want to support in the application the user interface (UI) team uses to convert the font file into font textures. If a given font file does not support the Spanish character ñ, for example, the generated texture will not contain a glyph for it and, as a result, this character will show up in-game as a square, a question mark or simply won’t appear at all. If a font file does not support a character we need, we have two options:

Change the font. This is extremely risky to do after localization quality assurance (QA) has started, since the font size could change, affecting all menus and forcing you to potentially have to retest the game from scratch.

Add the missing characters to the font. Adding a new character to a font is a solution most companies will want to avoid. You would either need to own the font and have an artist who can add new characters to it, which is a daunting prospect in the case of Asian languages, or you would need permission from the font creators to modify the font yourself. An alternate solution would be to get the font creator to add the missing characters directly, but this could be an expensive option.

You need a tool

When localizing your game, the only time you should rely on localization testers’ eyes is to improve the linguistic quality of your game. For any nonlinguistic issues, you should be relying on tools instead. There are ways to automate checks and fixes for most nonlinguistic issues in video game localization. Throughout the internationalization and localization of the video game projects I supervise, I normally use an average of five localization-specific tools. During my talk at the San Diego Game QA Localization Forum in December 2015, I presented four of these tools. In this article, I will focus on one of them, Font Analyser, an internal tool developed specifically for what we need.

Font Analyser allows me to do several things:

Generate a list of all the characters that will be required for a specific set of languages.

Generate lists of all the characters used in a game per language, or per language set.

Generate a list of all the characters supported by a font file.

Compare the lists above to flag potentially corrupted characters used in a game.

Two different approaches

Most games I’ve worked on that are localized into Asian languages (mainly Japanese, Korean and Chinese) use separate fonts for these languages. Using the same font for all supported languages is definitely possible, but UI teams normally have very specific artistic and stylistic requirements, which limit your choice of viable Asian fonts. Also, fonts selected for the European languages usually do not support Asian languages. However, the main reason why two approaches are needed is memory usage. All European languages combined will only need a maximum of 300 characters or less, whereas a Korean translator has a pool of over 10,000 characters to choose from within a given font. Including all the Asian characters available in the font in the generated textures would cause memory issues, so the list needs to be trimmed down to include only the characters that are actually used by the translators.

Approach 1: Short-alphabet languages. This approach should be used for any language that uses 500 characters or less. Your tool should provide a multiselection language picker that includes all the short-alphabet languages: English, French, Italian, Spanish, German, Portuguese, Polish, Russian, Scandinavian languages, Hungarian, Czech and so on. Once all of the supported languages are selected, the tool should then generate a complete list of all the necessary characters for all of these combined languages. Do not forget to include all English letters, numbers, punctuation characters, special symbols (©®™), the non-breaking space, the ellipsis character and language-specific punctuation characters such as German inverted quotes. Once you have the complete list of characters you will need to localize video games in these short-alphabet languages, compare it with the chosen game fonts to ensure that they contain glyphs for all of these characters. Then, provide this list of necessary characters to the UI team so that they can use it to generate the font textures.

Ideally, the game should use as few language-specific fonts as possible, so the more languages your main fonts support, the better, since you will be able to use the same font textures for all the supported languages.

Approach 2: Long-alphabet languages. This approach should be used for all languages whose alphabets cannot be fully supported in the font textures due to memory limitations, such as Japanese, Korean or Chinese. You should start by compiling a list of characters that should always be supported:

Both kana alphabets for Japanese

Double-byte punctuation symbols

All characters found in the terminology lists of first party publishers

Supporting these “default” lists of characters per language from the start will reduce the need to regenerate the font textures near the end of the project, which should be avoided to reduce risks of introducing new bugs.

About a month before submission to first party publishers, the supported character lists for the Asian languages should be locked, at which point Asian translators should refrain from using characters that are not in the list. Because this is sometimes hard to achieve, you should ensure your tool scans the string table often to ensure no unsupported characters have been introduced prior to the submission.

What characters will your game need? The following lists contains certain characters your game fonts should always support, regardless of the language:

English alphabet letters: ABCDEFGHIJKLMNOPQ-RSTUVWXYZabcdefghijklmnopqrstuvwxyz

Roman numbers: 0123456789

Punctuation symbols: !#$%&()*+,-./:;<=>?@[]`{|}´…–‘’'”“”_^             

Other characters: ¢£€©®°™

Always support the ellipsis character (Alt+0133) and the non-breaking space (Alt+0160). These two characters will prove extremely useful for localization QA.

The following list contains all the additional characters you will need for each of the most common languages:

French: àâæéèêëîïôœùûüçÀÂÆÉÈÊËÎÏÔŒÙÛÜÇ

German: ÄÖÜäöüß‚„

Italian: àéèìòùÀÉÈÌÒÙªº

Spanish: áéíóúüñÁÉÍÓÚÜÑ¿¡ªº

Portuguese: àáâãçéêíóôõúüÀÁÂÃÇÉÊÍÓÔÕÚÜ

Polish: ąćęłńóśżźĄĆĘŁŃÓŚŻŹ‚„

Czech: áčďéěíňóřšťúůýžÁČĎÉĚÍŇÓŘŠŤÚŮÝŽ‚„

Russian: абвгдеёжзийклмнопрстуфхцчшщъыь                                                       эюяАБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШ                                 ЩЪЫЬЭЮЯ№

Dutch: àáèéêëïóöÀÁÈÉÊËÏÓÖ

Danish: åæøÅÆØ

Norwegian: åæøÅÆØ

Finish: åäöšžÅÄÖŠŽ

Swedish: åäöÅÄÖ

When a special character (any character that’s not part of the 128 or 256 ASCII tables) is integrated in the game in Unicode, sometimes the code makes an internal encoding conversion at run-time. For example, a special accented character might be converted from Unicode to UTF-8 incorrectly. Very often, encoding conversion issues will be misinterpreted as font corruption issues. Differentiating these two issues is normally straightforward, although sometimes these types of bugs go through a few people until someone in the development team finally realizes that the font is not to blame. When a character is not properly converted between two different encodings, you will most likely see each special character replaced by two characters. Quite often, one of them will be an à and/or ~.

Other font issues to look out for

Sometimes, ensuring that all necessary characters are supported by the game fonts and included in the font textures is not enough to eradicate all font-related issues. It is possible that certain glyphs in a font were not properly designed, so it is important to go through all the supported glyphs to ensure their look is appropriate. This is usually only doable for European languages, since some Asian languages can contain several thousand glyphs.

Be aware of certain fonts designed for mobile games. These fonts are often height-restricted, which would not be a problem if you plan on using them with English texts only. However, once displayed on the screen, certain accented characters could look smaller than the non-accented upper case characters.

To avoid this issue, we (I must give credit to my colleague Frédéric Tabbal here) decided to create a tool that would render all of the special characters on a single form so that we could easily review their design.

Watch out for the credits. Depending on where the development studio is located, you might also need to support certain language-specific characters in your fonts, even if your game is not localized into that language. For example, if your development team is located in Sweden, the selected fonts for all languages should include glyphs for the Swedish special characters, since the developers’ names in the credits will certainly use some of these. Keep this in mind before purchasing and committing to all your fonts or you will be forced to destroy some developers’ names.

Even if you follow all of the suggestions I have provided here, you might still end up with unsupported characters in your game. Quite often, the corrupted characters are not coming from the game assets but from external assets that are not part of the centralized set of game strings. Some of these assets may be first party metadata, Message of the Day texts, Privacy Policy or Terms of Service documentation. It is very important that these texts are also taken into account to ensure that they do not use any unsupported characters.

For example, let’s suppose the console region is set to a locale supported by the console but not by the game (Czech, for example). The game text will be in English, but the price of the items on the in-game store could use a currency symbol that is not supported by the game font (character č used in the abbreviation for the Czech koruna).

If you use different fonts for certain languages, keep in mind that their in-game display could vary in size, which could reduce the amount of characters that can be used for certain translations and might cause overlaps between various text boxes in the UI.

If your video game includes a chat system in its multiplayer mode, be aware that players might use unsupported characters when chatting online. It is normally accepted by first-party publishers that certain characters can be unsupported in this environment, but you need to be aware that certain UI systems could try to convert the unsupported characters into other glyphs supported by the game font. Since you have no control over the characters a player could enter, you simply need to ensure that the font textures do not support certain politically incorrect characters. I have actually seen an unsupported character get converted to the swastika symbol in the past.

Conclusion

Whether you are a developer, a translator, a localization tester or a localization project manager, if you work in video games, you need to be aware of these issues and approach them in a proactive manner early in development. Ideally, the fonts should be analyzed and validated before localization QA starts. Relying on the testers’ eyes to flag these issues is not a good idea, especially since current AAA games typically contain over 100,000 words of text and are localized into six languages or more. The cost of developing a font analysis tool is a minimal fraction of the costs needed to cover the QA, localization and development resources to flag and fix these issues later on in the process.