Localization of machine software

By François Massion April 12, 2011

There is really no lack of literature about software localization. Many interesting articles describe at length what it takes to localize a program, starting with the globalization of software in the development phase to make it match the requirements of different countries and languages, up until the production of multilingual online help. These contributions deal mainly with office applications running in a Windows, Linux or Mac environment.

The rapid progress in the automation of industrial production, the widespread use of electronics in everyday life, and the internet as a communication and cooperative platform have put some strains on traditional localization approaches. This evolution brings new challenges for translators and software developers alike. About 50 years ago, the first numerical control machines appeared on the market. Today, you can find software texts in as diverse situations as cars, operating rooms and common home appliances. And, of course, the user expects to read and understand all the messages and commands in his or her mother tongue.

So far, the localization of machine software and embedded system software has attracted no or little attention from the localization industry, even if (or rather because?) the difficulties are quite large. The result is that many manufacturers of numerically controlled machines, plants or equipment often go backwards over themselves in time-consuming and costly ways to localize their software. Translators have their difficulties with machine texts as well. This situation is partly due to a lack of awareness of the internationalization and localization process on the part of the software engineers, and partly due to the fact that generally accepted localization concepts are missing in this area. Many manufacturers standardize the programming of their applications, though, as it is the case with the IEC 61131-3 or with the ISO 14649 (STEP-NC) standard.

When it comes to the localization of machine software, companies are facing development, linguistic and organizational challenges. In order to better understand what distinguishes the localization of machine software from “normal” localization projects, let’s first summarize the classical localization process.

Traditionally, or at least ideally, software would be developed right from the beginning for international use. How this is implemented may differ somewhat depending on the programming language, but the basic principle is that the texts of various programming objects are stored in separate files (usually called resource files) together with some meta information such as the object type (messages, menu items, buttons, dialog titles, field names and so on) and possibly the object ID. In the case of Windows

.NET applications, programmers generate so-called satellite assemblies, which include translatable strings. In other programming languages, the software stores these strings in similar files, such as ResourceBundle in Java applications. Once the resource files have been generated, they can be processed with localization tools such as SDL Passolo, Visual Localize or Alchemy CATALYST. These programs use a parser (filter) for binary source files (DLLs or EXE files) and for other formats (XML files) and import the texts to be translated into the translation editor with some additional information if available. In the case of programming languages such as Visual C++ or C#, these localization tools offer a preview of the dialog in the translation editor. The translator can adjust the length of a field or object if he or she needs more space for the translation.

Several applications for machines and devices are also programmed in languages such as C++. Whenever the developers have adhered to commonly accepted localization concepts, these applications can be localized “normally,” as it is the case with other programs. In practice, though, not all developers of machine software are familiar with the localization process and its methods and do not use all possibilities offered by their programming language. Therefore, they sometimes choose complicated and not always reliable approaches to localize their machine software.

Many machine applications are written in specific programming languages such as EXAPT, COMPACT, Siemens S7, APT or in high-level languages like C for the programming of microcontrollers and use their own compiler. It is therefore particularly demanding for the developers and the translators alike to process the translated texts. The issues involved are manifold.

First, the text must be made available to the translator in an editable format. Once translated, the text must be imported back into the machine software.

The encoding of special characters shall be supported for many languages, such as Spanish, French, Asian languages or bidirectional languages such as Arabic and Hebrew.

Due to length restrictions, the space available on the screen or on the machine display often only allows a limited number of characters. Depending on the situation, some applications work with one or more lines of text per message. The maximum text length can be specified as a number of characters or in pixels.

Programmers insert variables, shortcuts and line breaks as well as escape sequences in the text to be translated, and the translator should deal with all of them. These should also fit in the linguistic context of the target language.

Many documents can hardly be understood without contextual information or additional explanations. In most cases, the translator never sees the final result of his or her translation (for example, all texts and objects displayed in the same dialog) and has thus no possibility to check the translation in context.

What would be the typical workflow of a localization project for machine software? The developer first exports the text to be translated. This text is then prepared for translation by the translator or by an agency. After completion of the translation, the quality assurance is performed, which both checks technical and linguistic aspects of the project. Subsequently, the translation is exported back into the original format and sent to the developer, who imports it in the machine software and, if necessary, makes adjustments to the translated strings in length and so on. Unfortunately, many companies do not always take the necessary step of testing the localized version of the machine software to make sure that the translation is correct, both from a technical and linguistic point of view (Figure 1).

In an ideal world, developers would have planned the localization of the machine texts right from the beginning. The programmers would have separated text and code and provided additional meta information. This is a prerequisite for a cost-effective localization process. However, in reality, companies often export the strings to be translated line by line, which makes it difficult for the translator to understand what belongs together and what the real meaning of certain expressions is. Some companies have recognized this problem and give the translator additional metadata to help. This work is time-consuming and sometimes requires a complicated series of conversion steps. Some programmers sort out and group the software strings according to their object, module, topic or function and save them in separate files or as Excel tables or spreadsheets. This means additional work both for the developer and for the translator, which could have been avoided if the developers had made their software localizable from scratch.

In general, the exported data will be translated with standard translation memory (TM) programs after the language service provider has separated text and code. This is where the first difficulty starts, depending on how the programming has been done and on how the data has been exported. It is not always a trivial task to separate the text and code from one another because a uniform standard pattern is required to discern the translatable text from the program code, which has to be protected (Figure 2).

Some exported files cause additional problems for the translator. A typical case is the dividing up of one single sentence over two or more lines, which are saved separately in the machine software. Example:

21066, OBJECT MUST BE IN THE RELEASED OR

21067, IN TRANSPORT STATUS.

Since not all languages have the same syntax, such sentences will in many cases lead to errors when TM systems insert mismatched units into a translation.

The next issue is the limited space available on the machine display for the translation. The translator receives instructions not to exceed a set number of characters or pixels per display line(s). But how to implement this? Either the translator manually checks the length of the translation line by line, which is quite complicated, or he or she can use a script, a macro or a dedicated application to ensure that the maximum available length hasn’t been exceeded. The tools used for that purpose range from simple Excel formulas that output the length of a text in a cell to complex routines calculating the width of text in pixels with parameters such as the letter type (small like i or large like m) and font size.

A particular problem arises when the length restrictions rules constantly vary. Here is an example of instructions for the translator:

“At the end of the lines there is an abbreviation and a number, e.g.: (sl/72). The abbreviation -sl- means that the lines must nor exceed one single line. The abbreviation -ml- means that the translation may be spread over multiple lines. The number indicates the maximum line length in characters (in this example 72 characters). Spaces between quotation marks should be kept.”

The translatable string may look like this:

1873 : (‘xmessage’,’search failed’,’’’n search operation could not be performed because: %(reason)s.n’’’)(ml/80)

Fixing such problems is doable, but it is time-consuming, costly and requires programming skills not every individual translator has. Scripts or macros will tell the translator and the proofreader when they have to modify the translation in order to meet the developers’ requirements. If such requirements are not complied with, there is a risk that some texts will not be visible, which in turn may lead to a faulty operation of the machine/device or would trigger a costly round of corrections of the translation. How can a programmer recognize that the decisive word for on or off at the end of a German sentence has not appeared on the machine display because the translation is too long?

A particularly tricky space problem occurs when the client requires the translator to stick to the indentation of the source language. In older machines and tools, unproportional fonts are used, meaning all the characters have the same width, and columns in tables are created simply by using blanks. Due to the differences in the length and number of words between languages, this type of requirement is particularly difficult to implement. Again, a translation service provider with know-how in dealing with scripts and programming tools is required to automate the compliance with such instructions.

The last major technical problem is the correct representation of special characters in foreign languages. Not all machine programs use Unicode. They are thus not able to handle double-byte characters like Chinese or Japanese. Many systems currently support at least all European languages, including Russian. However, some are still working with different code pages, so programmers and translators need to clarify beforehand which fonts and which encoding will be selected.

From a linguistic point of view, the localization of machine software is not an easy task either. Many software developers just send simple word lists to their translators. They expect them to churn out a good translation quickly even if the context is missing. If a word such as device or support stands alone, the task of translating is like playing Russian roulette. The first issue starts with the quality of the source text. Unlike traditional software or software documentation, the texts are generated by software engineers with no or little linguistic training. It is not uncommon to find grammar or spelling errors. To make it worse, the software has generally been developed over a certain period of time by several programmers, and again and again there are inconsistencies like Compressed Air Valve On and Switch On Air Pressure Valve with the same meaning.

In software projects some terms are written differently depending on the program object they are used for (dialog title, field, message), are shortened differently or even have different meanings, as with the word support. In such cases, traditional TM systems can cause mistakes if existing translations are taken over from the TM in an uncritical manner.

In some situations, it is impossible to determine the exact meaning of an expression without context. What does the expression search term really mean? Is it a noun (term searched for) or a command (search for the term)? It would be helpful if the developers would export information to help the translator identify the object type associated with the string and to see which texts belong together. In any case, it is important and necessary for the translator to be able to ask questions (and to actually ask them) and that the client names a contact person with a good knowledge of the software and of the product who can answer these questions in a competent manner.

Translators have to adapt their natural translation to make it fit into the technical requirements of the client. Languages command different word orders, and this sometimes influences the way software texts are translated. For example, the developer may have inserted in the middle of a sentence an escape sequence such as n to stand for a line break. The translator must then guess where to put the escape sequence for the line break in the translation. In some situations, the programmers are unaware of the linguistic rules of the target language and have inserted the variable erroneously. This can be the case if the neighboring words like adjectives get different treatment depending on the gender of the noun, for example.

Because of the limited space available, it is often necessary to use abbreviations or to juxtapose a series of words. How is it possible to shorten the expression Shutoff Valve Not Closed to only ten characters? This leads to constructions like ShOffVlvNc that even the machine operator has difficulty understanding in the original language. Some abbreviations take different meanings depending on the situation, such as Pos, which was used in a specific project both for positive and for position.

For the same reasons, words or commands are juxtaposed with no clearly visible sense. An expression such as Dedusting Solenoid Valves On Duration Timer Setpoint (x0,1s) remains a closed book even for some clients. Here the translator needs a clear understanding of the way such expressions are created: What is the pattern? Which information comes at the beginning and at the end of the message? Have typographical elements like capital letters been used to mark a group of words? In order not to confuse the final user with different patterns, the client should define in a style guide linguistic rules as how to generate and coin strings, and write messages in a uniform way. The same applies to the coinage of abbreviations.

These technical and linguistic requirements mean a tedious chore for the average translator, and often requires many hours of manual work. The development of appropriate scripts or checking routines is often an effective and reliable technical solution. However, this is only possible with associated development efforts and can only be done for projects with a certain volume. Unfortunately, small machine software projects will continue to require substantial manual work.

Several aspects of the text can be checked with the quality assurance features of TMs or with independent quality assurance programs. These items are the consistent use of a predefined terminology, provided of course that a terminology was created, and the correctness of numbers in the machine software. But not everything can be verified with the assistance of software. Since some of the translations have been produced without context information, the final output should normally be reviewed after the localized version of the machine software has been compiled because a sizeable part of the strings is context dependent.

Few clients have thought carefully about the entire localization process and have therefore not planned sufficient time or budget for this step. This unfortunately leads to the fact that the translated texts are corrected at a later stage when there is a complaint or when the translated strings have already reached the final customer. Some manufacturers of machine software have recognized this problem and developed their own applications to visualize the localized dialogues during the review process. This indeed allows the reviser to see all the translated texts that appear simultaneously in a dialog. Whenever clients have developed dedicated applications to localize their machine software and translate the software strings, the performance and functionality of these applications lay in general far behind the scope and performance of professional localization tools, which have been developed over many years just for that purpose.

For all the reasons above, one should consider the use of localization programs for machine software and embedded systems as an alternative to traditional TM systems. These localization programs, for example, offer the following functions:

Read available metadata from the files generated by the machine software (CSV, TXT, XML) such as the maximal length of the string in characters or pixels

Check the maximum number of characters or pixels per display line

Develop custom checks or file conversions with the help of the integrated macro editor

Provide context information to the translator in the form of metadata, images or links to external sites

Associate translation units with IDs (when available) and therefore make them more suitable for context-dependent multiple translations.

Repair is always more expensive than doing it right the first time. Translators and developers can save much trouble and work when they cooperate in a long-term perspective and plan together all phases of the localization process of a machine application.