Focus

Localization workarounds for non-internationalized software

Blake Madden

Blake Madden

Blake Madden was a software localization engineer and project manager for 18 years. He specialized in localizing legacy systems, as well as modern .NET products. He is now a data analyst in higher education — where he no longer has to worry about what not to translate.

Blake Madden

Blake Madden

Blake Madden was a software localization engineer and project manager for 18 years. He specialized in localizing legacy systems, as well as modern .NET products. He is now a data analyst in higher education — where he no longer has to worry about what not to translate.

E

nsure that ample space is provided for translations. Only include translatable strings in the resources. Never piece strings together to form a larger message. These are common internationalization practices that localization teams rely on for a smooth translation process. Unfortunately, there can often be roadblocks preventing these practices from being implemented.

Localization teams may have little influence during the software development process — a vital time for internationalization. Also, internationalization may not be a priority, and it is assumed that the localization teams will work around any issues that they encounter. The following is a discussion of such workarounds, as well as a cautionary tale of how counterproductive — and expensive — these workarounds can be.

What not to translate

Internationalization practices recommend that only translatable strings should go into the resources. Strings not intended for translation — for example, internal constants — should not go into resources. Doing this ensures that translating anything in the resources is safe; translators never have to question if something should be left untouched. This is particularly essential when translators are leveraging large-scale translation memory.

Unfortunately, localization teams may encounter software that stores all of its strings in its resources. In this situation, software engineers may rely on a system of “storytelling” to indicate what should not be translated. For example, there may be “DONTTRANSLATE!” tags in front of strings not meant to be edited. The problem with this is that a translator can still alter any messages that starts with this tag. A project-wide find and replace or the leveraging of translation memory/computer-aided translation can easily change a “DONTTRANSLATE!PAGE” into “DONTTRANSLATE!Seite” without the translator noticing it.

A workaround is to configure your translation software to ignore such strings; however, relaying this information to every translation team isn’t always reliable. And this also assumes that all the translators have software that supports this feature. A solution for this is to develop custom software to review the translators’ final work and verify that every “DONTTRANSLATE!” message in their output matches the respective message from the English resources.

It should be noted that these types of resource tags inadvertently add technical debt to the English version as well. Say that we have an HTML-generator program that has resources such as this:

DONTTRANSLATE!html table

The developer placed “DONTTRANSLATE!” tags in front of the HTML tokens to tell translators not to touch them. They also program the application to strip the “DONTTRANSLATE!” suffix when it loads these strings from the resources. But what if they forget that second step on one of the strings? Then users may be presented with output like this:


Log Report


Installation updated at 05:00.


These “DONTTRANSLATE!” tags are no longer just a problem for localization. Now it’s something that the quality assurance team has to watch out for as they test the English version.

Something similar that localization engineers may come across are sentinel tags around messages that should not be translated. For example, imagine a string table such as this:

__BEGIN_DONTRANSLATE table

Here, we have the developers telling us that nothing between “__BEGIN_DONTTRANSLATE” and “__END_DONTTRANSLATE” should be translated. In this situation, we are relying on both special tags and our resources’ geography in the string table to tell us what is unsafe to translate. This means that we need to inform our translators of these messages’ specific IDs to avoid translating them. Like before, another custom tool may be needed to ensure that translators followed these rules.

Relying on these tags can also be an issue when developers forget to add them to unsafe resources. The localization engineer applying pseudo-translations during the testing phase — followed by translators testing their work — can generally find most of these issues. However, requiring extensive testing and always having a sense of doubt as to what’s safe to translate adds a huge workload to the localization process.

Another practice is the inclusion of unsafe and safe strings in the same resource, where the two are separated by a delimiter. A tag indicating that the string shouldn’t be edited isn’t applicable here, given that part of it is meant to be translated. Here, the engineer will need to rely on intuition, and eventually custom tools, to deal with this.

For example:

ID_ACCOUNT, Account table

Here, the first section of the resource is an ID used internally by the program, and the string after the comma is the user-facing label. Translators will need to be instructed to only translate text after the comma — making the use of translation memory difficult or even impossible. Also, to ensure quality, the localization engineer will need to develop an in-house tool to verify that none of these resources have their front sections translated.

How much space do I have?

Internationalization practices recommend that you should always allow ample space for translations, particularly from English. This is because most translations are longer than their English counterparts. Although this is a helpful practice that translators rely on, it isn’t always a reality. Sometimes programs are hardcoded to assume their English messages’ lengths and creative localization solutions are needed.

Say that we have the string “Cancel” in our English resources, which is six characters long. If the program is hardcoded to load this resource as six characters, then a German translation (Abbrechen) will appear as Abbrec (the trailing “hen” is clipped off). The workaround here is either abbreviating (e.g. Abbre.) or finding an alternative translation. For longer messages, sometimes relying on more curt, terse language is required. For example, say that we have the message “Enter your ID:” consisting of 15 characters. The professional “Geben Sie ihre ID ein:” will be too long, so a terser “Gib ihre ID:” will have to be used.

Sometimes abbreviations and alternate translations aren’t a solution. A prime example is Japanese, where removing a single character can create an entirely different message. If the software being translated is multibyte — not Unicode — then using half-width kanas may be a solution.

Say that we have the string “Error” translated as . For a Unicode program, this would be a three-character translation that is shorter than its five-character English counterpart. In a multibyte program, however, this is an issue because these three Japanese characters will consume six bytes — one byte more than the English string. This means that at runtime, the last byte will be lost and the string will be corrupted. Using halfwidth kanas will consume less bytes and allow our translation to fit in the buffer provided. In this example, our translation can be converted to and consume only three bytes — nicely fitting into our five-character buffer.

The localization team can assist Japanese translators by building custom software to detect and correct when halfwidth Kana conversion is necessary. On Windows, the Win32 function LCMapString — with the LCMAP_HALFWIDTH parameter — can be used in such a tool to accomplish this.

The reverse side of this problem is when a translation is shorter than the English string. For example, say that we have an English string “Ticket” consisting of eight characters due to the extra spaces included for padding. The spaces present a challenge. If we translate this as Karte, then the program will try to load eight characters from a five-character string-table entry and will crash. This will require the translator to count the number of characters in the English string and pad their translation with spaces to ensure that they are the same length. The localization engineer should also assist them by developing a custom tool to verify and correct this whitespace padding with the final translations.

It should be noted that all of this requires extensive testing, investigation and bookkeeping by the localization team. They must find and catalog every problematic resource that the program assumes to be a certain length, which entails time-consuming trial and error. It also requires in-house development of custom localization tools to find and fix these types of issues, as off-the-shelf translation tools can’t provide such complex validation checks.

String flipping

Another internationalization recommendation is to never piece strings together into a larger message. This helps translators understand the context of the strings, as well as enabling them to apply their language’s grammar properly. Unfortunately, programs that aren’t internationalized may instead stitch two or more strings together to form a message. For example, here we have two strings:

Programs stitch two or more strings together to form message table

These strings are concatenated at runtime as “Print Setup.” The translator’s task is a simple matter of translating each of these strings, right? Well, not exactly. For a French version, say that we translate it this way:

Strings are concatenated at runtime as "Print Setup" table

At runtime, it will appear as Impression Configuration. In this case, the French word ordering is backwards and looks unprofessional (Romance languages generally would say “Setup of Printing” rather than “Print Setup”). We can change our translation of “Print” to have the proper context:

At runtime it will appear as Impression Configuration table

This fixes the wording context, but the word-order issue remains. To correct this, we need to use an inelegant workaround: string flipping. Here, the French translator will translate “Print” with their translation of “Setup” and vice versa:

To fix word order we use string flipping table

The result will be Configuration de l’impression, which is what we want.

This solution has numerous drawbacks. One is that the translator must know precisely which strings are being pieced together and where they appear in the program. This requires exhaustive testing by the translator. It also requires time-consuming, trial-and-error investigation by the localization engineer to figure out which strings behave this way.

Another issue with string flipping is that you must ensure that these strings are only used in one place. For example, say that this “Setup” resource is used for the phrase “Print Setup,” but also as the title for a generic “Setup” dialog. In our French version, this “Setup” dialog’s title will appear as de l’impression (“of printing”).

Another interesting example is when an English phrase that is pieced together contains more words than its translation counterpart. In our German version, “Print Setup” may actually be just one word, Druckereinstellungen. The solution is to use the translation for one of the words and leave the other blank:

An English phrase that is pieced together contains more words than its translation counterpart table

A technical obstacle that this may cause is localization quality checks that will warn about the translation for “Setup” being blank. These sort of quality verifications will need to be customized to ignore these false positives.

The ultimate drawback to string flipping is that it makes our translations unusable for building translation memory. Anyone hoping to extract content from our work to generate glossaries will be disappointed to discover that the German word for “Setup” is a blank string. Their disappointment will only grow when they see that our French translation of “Setup” means “of printing.”

Unusable translation memory, time-consuming trial-and-error testing and confusing translations certainly don’t make string flipping seem like much of a solution — and it isn’t. But when faced with messages being pieced together from smaller strings, hacks like this are what translators are left to work with.

Mega strings

The opposite of multiple strings being concatenated into a single message are mega strings. These single strings are sliced at runtime into separate messages. Consider the following:

Example of mega strings

Although this is stored in the resources as a single string, the program actually views it like this:

10 Character Blocks in English table

At runtime, the program will split this string into four separate chunks (each ten characters long) and then display them in a report:

At runtime, the program will split this string into four separate chunks

Along with being limited to ten characters for each translation, we must be certain that we have the proper length for each section of our translation; otherwise, the other translations will be sliced at the wrong position. For example, let us apply a German translation:

The other translations will be sliced at the wrong position

This looks OK, but at runtime we will see this:

German translation sliced at the wrong position

Something seems to have gone wrong on the second and third line. This is because we miscounted the number of spaces between “Addresse” and “Statd.” When the program slices this string into ten-character chunks, the word boundaries are misaligned:

10 Character blocks in German Table

The solution to this is for the localization engineer to develop in-house tools to parse, review and possibly correct translations for resources like this. This custom software will need to know which specific string resources behave this way, along with knowing their chunk lengths.

Embedded placeholders

Often, resources may need to be decorated at runtime — for example, a button label having an icon next to it. Programs usually connect specific resource IDs with icon IDs internally, and then display them together at runtime. Another approach is to embed cryptic syntax inside of a resource, confusing both translation software and translators alike. For example:

Another approach is to embed cryptic syntax inside of a resource

In this resource, “!S[82]” is some sort of placeholder — for an icon? for another string? — and “Account Total” is the user-facing message. An issue with this is the inherent difficultly with machine/translation memory leveraging; the unusual “!S[82]” may confound most off-the-shelf translation tools, forcing translators to manually — and very carefully — translate the safe section of this resource. The other issue is that if this placeholder accidently becomes malformed during the translation process, then the program will likely crash when loading it.

If embedded placeholders are too complex for standard translation tools to handle, then the localization engineer will need to develop in-house software to leverage translation memory against these types of strings. To ensure quality, such a tool would also need to review and correct any malformations in the manually performed translations.

Well, the English version looks good

In the past, translators were required to manually resize user-interface content. To simplify this process, modern technologies like Windows Presentation Foundation (WPF) introduced features such as dynamically sizing controls to fit their content. These features are implemented during the internationalization phase to ensure that the English and localized versions all look correct.

Say that we want to have a two-button dialog where the buttons should be lined up in a column and have the same width. In WPF, the recommended way to accomplish this is to place the buttons inside of a stack panel and set their widths to “Auto.” The “Auto” width will tell the buttons to fit their content and the stack panel will stretch the buttons to the same width. The benefit is that translators will not need to do anything for the controls to fit their content — the program will handle this dynamically.

Unfortunately, developers may not take advantage of this feature and instead simply use hardcoded widths. For example, here is XAML code using hardcoded widths for two buttons:


This will produce the UI that we wanted for our English version:

Save options in English table

Applying our German translations will yield a truncated button, though:

Save option in German will yield a truncated button

Even if our translators have tools capable of editing XAML files in design mode, expecting them to manually review and adjust every control will be time-consuming and expensive.

A solution is for the localization engineer to develop an in-house tool that parses the XAML files after translations have been applied and adjusts the UI elements. One adjustment would be to change the width attributes for certain types of controls — button, labels and so on — from hardcoded values to “Auto.”


Other adjustments could include making the dialogs wider, tweaking Margin attributes, and adding MinWidth attributes to the controls. Although setting controls’ widths to “Auto” will resize them to their translations, some of these other tweaks may be necessary to make the controls a uniform width. For example, the tool could be programmed to change all buttons to autofit and also tweak this specific dialog’s width to 400:



This will fix the localized version:

Save options in German after the code is applied

By automating this, the UI adjustments can be done by the localization engineer, rather than forcing multiple teams of translators to do this manually for hundreds, if not thousands, of controls.

The moral of the story

With all these localization workarounds available, is internationalization even necessary? Well … yes. When various internationalization practices aren’t used, it becomes necessary for localization engineers to spend a massive amount of time and effort testing, reverse engineering and cataloging issues that translators will encounter. On top of this, they will need to develop software tools to automate workarounds that off-the-shelf translation products were never designed for. Finally, translators will need to spend as much time testing and sidestepping issues as they do actually translating. Although localization engineers can always be counted on to provide workarounds, the expense of these workarounds should be considered when planning internationalization.