Learning localization in context

When I started my career as an English-Japanese translator right after graduating with a bachelor’s degree in literature and linguistics, software localization was a completely foreign land for me. I knew little about software development and programming. What, for example, were the letters with a percentage sign such as %s or %d that sometimes appeared in my texts? Why did they sometimes accompany a number, like %1s? These questions were soon answered by a colleague translator — I learned that they were called placeholders and they worked in a software application.

Placeholders are a relatively simple problem; one of the most difficult hurdles for translators, who often work remotely, is the lack of context. They receive text strings that should be translated into another language, mostly without the proper information about the context, such as screen shots of a user interface. While translating once, I was really frustrated when I found only Sun in a spreadsheet of source language text. Did it mean a star in the solar system, the abbreviation for Sunday, the company Sun Microsystems or a person’s surname? Nobody can confidently translate something like this without contextual information. I frequently wished I could have displayed the translated strings on the user interface immediately after finishing translation to check if they appeared correctly in context.

After working for several years as a translator, I started studying software development in graduate school. There I learned how to create an application with support for multiple languages, and how formats such as date or number differ by locale. For instance, I had not known until then that the decimal separator in German was a comma rather than a period. Through this experience, I can say that even a little background in programming and localizing an internationalized application is helpful for software translators to know how multilingual software works.

However, there is always the cost. Learning the basics of software programming and internationalization requires a certain amount of time and effort. I again often wished for some good educational material to learn the mechanisms of multilocale software and to increase my knowledge about different locales.

Because I couldn’t easily find anything that did this, I made Expense Recorder, a software application that shows translated strings right after translation and has a support for different locales. Anyone can download it for free at http://research.nishinos.com/training-app.

Expense Recorder is a web application that can record and track expenses in an office. After logging in to a dummy account, users can add information such as a date, an amount of money or a category of expense (such as stationery or transportation). The interface is shown in Figure 1. Through the experience of actually translating this application and looking at different formats in different locales by switching the interface language, translators can gain knowledge about software localization and internationalization.

To use and translate the application, you do not need any special tools, such as an integrated development environment that a professional programmer uses for software development. The application runs on a common web browser, and its text strings can be translated by using a text editor that is equipped with a personal computer. It only uses HTML and JavaScript.

As a translator, you can learn various things from using the training application. First and foremost, as this is a web application, translators can learn how to localize software by translating the text strings used in user interfaces and help documents. After finishing translation and saving the language resource files, you can display the translated strings on the user interface just by reloading the HTML file on the browser. Then the strings can be checked if they are suited for the context. If they are not appropriate, you can translate and check again until you are satisfied with them. Figure 2 is an example of translation. The upper part shows the text before translation in English, and the bottom part shows the text after translation in Japanese.

Although a computer-assisted translation (CAT) tool is not necessary for translating language resource files, you can use one to practice how to use such a tool. To translate a help document file of the training application, a common CAT tool such as Trados or OmegaT will do because the help document uses HTML files. To translate a user interface file with a CAT tool, on the other hand, requires a small trick. The user interface file is a JavaScript file which cannot be directly translated with a common CAT tool. You need to copy the translatable parts, paste them on a new text file and then change the file extension to .json, an acronym for JavaScript Object Notation. You can use OmegaT, for instance, to translate a JSON file by adding a JSON filter. For details about how to add a JSON filter to OmegaT, visit the website of the training application. If you use other CAT tools, you may need to create a JSON filter for yourself. Note, however, that a CAT tool is not a requirement. You can translate all language resource files simply with a text editor.

With the free Expense Recorder tool, translators can learn how placeholders function in a text string. Figure 3 shows a part of the interface that is displaying the number of registered items and the total amount of money, and Figure 4 shows the corresponding string in the language resource file.

Translators will translate the underlined part. The special marks, %1$s and %2$s, are the placeholders. If you look at the user interface while translating, you can guess that the number of items is put in the first placeholder and the total amount of money is put in the second placeholder. Translators may try switching the order of placeholders or even try deleting them to see what happens in the application. Such an experiment is possible only in a training application, and it should be a great experience for translators, especially if they have never translated strings with placeholders before.

Formats for time or number vary by locale. A translator can see how such formats differ in context by changing the locale setting in the application. The application supports 16 locales as of version 1. For example, if you enter 12,345.00 in the money field and then change the locale, you can find different formats, as shown in Figure 5. The topmost entry is for US English, the middle one is French, and the bottom one is German. Unlike the US locale format, the French locale uses a space for a thousands separator and a comma for a decimal separator, and the German locale uses a period for a thousands separator and a comma for a decimal separator.

Date format is another example that differs by locale. Some locales express date in the order of month/date/year, and others express date/month/year or year/month/date. You can compare these formats by simply changing the display language of the application. Figure 6 shows different date formats from three locales. The upper is the US format, the middle is the UK format and the bottom is Japanese.

Another function that the training application offers is switching the displayed strings based on the noun plural form. Languages like Chinese or Japanese have only one noun form (singular), and languages like French or English have two noun forms (singular and plural). To make things more complicated, Arabic has six and Russian has four different forms. If you need to translate a language with only one plural form to a language with two or more plural forms, you are often forced to write text like “You have x item(s).” This method does not look neat, and might not work well for languages with more than two noun forms. However, the training application can hold up to six types of text for one displayable text and switch among them based on the number of a noun. Figure 7 shows six different types for one displayable text. Blank means there is no applicable noun plural form for the language. When looking at this list of texts, localization staff will learn that there are languages with multiple plural forms and that they need to be careful when localizing software.

According to GALA’s definition, internationalization is “The process of generalizing a product so that it can handle multiple languages and cultural conventions without the need for redesign,” and localization “Describes the process of adapting a product to a specific international language or culture so that it seems natural to that particular region, which includes translation, but goes much farther.” To put it simply, programmers internationalize a software product and then translators localize it. While internationalization and localization are an inseparable process in this respect, translators often have little knowledge about internationalization. Certainly they do not need to internationalize a software product, but at least they have to know what internationalization and localization are.

Because the training application is fully internationalized but not localized yet, translators can see the border between them. The calendar, for instance, is internationalized, so translators do not need to translate the name of month or day. The portion that translators have to deal with is the target of localization. The knowledge about internationalization will be essential in the era of agile development, where translators often work closely with programmers.

I held a two-hour localization seminar using the Expense Recorder application at the Tokyo Institute of Technology in July of 2013. Attendees were mainly university students and professional translators. According to the result of a small survey after the seminar, almost all of them believed that the training application was useful to learn about software localization.

The Expense Recorder is only one of many that might be created to teach students about localizing software in context. I hope this kind of training software is developed, and that many institutions utilize it to help students and translators improve their knowledge of and skills with software localization.