Right-to-left localization for mobile devices

By Amr Zaki November 6, 2012

During the past few years, mobile device applications boomed and the need to get these applications localized into right-to-left languages became crucial. The question is, why is localization into these languages important?

In fact, there are a number of reasons. For one thing, there is a big market share to be had in this area in general. Secondly, there is huge potential in new categories specifically, such as banking and health care. But social and news applications still have a long way to go.

But perhaps we should first explain what right-to-left languages are in the first place. Right-to-left languages involve one of the oldest written scripts — they started with engraving on stones and wood from right to left, as most people are right handed and this seemed logical to them. There are three ways of writing scripts worldwide: left-to-right, right-to-left and vertical. Arabic script languages (which include Urdu and Farsi) and Hebrew are the only living right-to-left languages worldwide, and their scripts are often called bidirectional (bidi), because they can involve left-to-right text as well. Although some readers may think that Arabic and Hebrew scripts are similar as they are both right-to-left, there are major differences between them as may be seen in Figure 1.

Many people may think this is somewhat limiting, and that Arabic script is used only in Middle East and Arab regions. However, in actuality, it is used in many other non-Arab regions, such as Urdu Pakistan, Panjabi Pakistan, Dari Afghanistan, Uygur China, Persian Iran and Sindhi Pakistan. Like Hebrew, Arabic is also used worldwide by a subsegment of the population.

While localizing mobile devices into Arabic and Hebrew, there are four major areas that need full attention. The first of these is bidi software enabling. Keep in mind that besides the fact that these languages are bidi, they also involve complex scripts. Arabic, Hebrew, Thai and Hindi all fall into the complex script language category because they have different representation forms for characters according to the position of the glyph or character.

During the 1980s, many applications were written in the Latin alphabet and hence did not support those complex script languages on DOS. However, this opened the way for local software companies to support using those languages on DOS — they used memory resident programs, generating supported fonts using different code pages. At this point, displaying a string in Arabic wasn’t easy, and developers had to go through different stages such as dividing the string into segments. Each subdivided segment consisted of a similar script. If E stands for English character, ES stands for European Separator (like plus and minus signs) and A stands for Arabic character, a string consisting of E, E, E, E, ES, ES, A, A had to be divided into three segments: E, E, E, E and ES, ES and A, A. Each segment would then go through a layout process. For example, an Arabic string that appeared in the memory as 1, 2, 3, 4 would need to go through a layout process to become 4, 3, 2, 1. The last stage was forming the shape of each glyph according to its position inside the string.

These previous stages are now called bidi engine or uniscrib in Windows, and this engine should be supported or enabled to be able to read those languages correctly.

If bidi is not enabled, all the text will be displayed in empty boxes or without shaping due to missing font or other information, which might include Arabic or Hebrew glyphs and the shaping engine that is responsible for the contextualization, enabling software to work from right to left. This was previously a complicated task, but now with the new software development kit and tools, it has become easier. There are two ways to enable applications to switch their beginning point from left to right: code and resources.

Code enabling requires some development to modify any application programming interface, while resource enabling works on the resource level using different resource editors provided by different companies. Using those tools, a user can create a mirrored window or dialog, in which the window or dialog appears as if you are looking at it in a mirror. Figure 2 shows the differences between the western dialog on the left and the right-to-left one on the right. One of the challenges is that many of these mobile devices don’t have Arabic or Hebrew script support besides the slowness and performance of the device when enabling bidi.

Shaping

Shaping in Arabic is one of the tricky issues that nonnative developers might face. The characters can look like Arabic script, but they will not be readable. Arabic characters should appear differently depending on their position in a word, and will often be linked together. There is a way for nonnative speakers to detect that the shaping is not correct, which is that the Arabic characters should be concatenated in about 90% of the lines. If the reader sees that all the glyphs are separated, with space between them like English characters as in Figure 3, then this means that there was some problem and shaping is not working.

The reason behind this problem is the lack of bidi engine support, with supported font.

Another problem that appears in many device manuals currently and is not easy to spot at all for nonnative speakers is the reversed string. The string consists of many words, but they appear from left to right, staying in correct sequence but displaying backwards. For example “The quick brown fox jumps over the lazy dog” appears as “dog lazy the over jumps fox brown quick The.” The logic behind this problem comes from certain localization tools that automate localization from the source language to the target language — Arabic, in this case. To avoid those problems, the strings should be reviewed by a native speaker before final delivery.

Reading order

The second issue that readers can face is the reading order. Reading order can become a problem when the text has mixed characters like English and Arabic, English and Hebrew, or Arabic or Hebrew with neutrals such as question marks, spaces or any similar characters. This issue can’t be detected by nonnative speakers in most cases; however, it is possible for nonnative speakers to be able to detect some of the problems.

To fix the issue of the reading order we have to go deeper to understand something called bidi Unicode markers. Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world’s writing systems.

Unicode markers provide special, invisible formatting codes as in Figure 4 to set the base direction for or to override the bidirectional algorithm in plain text. For example, the marker (LRE) U+202A will create left-to-right embedding.

One of the common problems when localizing to any right-to-left language is the location of the full stop or period. The problem appears when the reading order of the field is not set to right-to-left or the localizer can’t set it. In this case, using the Unicode right-to-left marker at the end of the string will force the full stop to be positioned in the correct place. A sample of right-to-left marker (RLM) behavior to fix the place of the full stop is shown in Figure 5.

The second issue in common reading order problems is the location of the parentheses. A common display problem is to find that the two parentheses are displayed at the same direction (both are closed or both are open), or as in the case shown in Figure 6, mixing right-to-left languages with English causes the word in parentheses to be positioned in the wrong place. This causes incorrect understanding of the string, and hence the string will not be readable. Using the Unicode RLE and PDF markers combined together will fix the problem — the RLE marker will be used to force the direction of the string to be right while the marker PDF is used to stop the reading order force.

A third problem is the location of European number separators such as plus, minus and so on. When using it with numbers within right-to-left strings, the location of the sign can be wrong and misleading.

Another sample for one of the problems is the concatenated strings, such as the Arabic string roughly translated as “Updated %S which expired %S” in Figure 7. The “%S” part of the strings will be replaced by a string-like date, and this will cause a reading order problem that needs to be solved on a case by case basis.

These examples are common problems that happen due to reading order. Some problems are easy to fix, while others will require complete understanding of the markers in question.

Controls

Controls are difficult to manage on mobile devices, as some of them are drawn manually to provide different effects, such as the glow behind a text, fading colors and so on, while others need redesigning to work on a right-to-left environment. This is besides the fact that sometimes there are hardware limitations preventing designers from changing the postition of text that is mapped to physical hardware buttons.

Those types of controls also need to be analyzed on a case-by-case basis. Some cases will need graphic designers to flip images or change shadowing for a button. In Figure 8, each button has a background image, while the horizontal selection line completes the effect behind the icon. In our case the mail icon has the wrong image background direction, and the horizontal selection also has a flipped background. The effect as seen in the correct case appears in Figure 9.

Another issue is the concatenation of images to avoid icon shadow location. In the Figure 10 screenshot, the conference icon has a left rectangle to show whether the status of the user is busy, free or something else. In right-to-left languages, the icon should be displayed on the other side of the user interface, and the gray shadow should be displayed in the other direction.

Takeaways

There are several takeaway points here. The good part is that the market is huge, as Arabic alone covers about 21 different Arab countries and then some non-Arab countries that use Arabic script as described previously. Because the market is not saturated yet, there are many opportunities and potential expectations for high revenue.

The difficult part is that localizing to one of the right-to-left languages involves a lot of constraints, as outlined, and therefore there is a need for both manual checks and automated testing tools that can detect some of these bidi issues. Certainly, high technical experience is required to get the best quality.