Localization maturity in emerging market languages

By Conor Bracken April 9, 2014

Over recent decades, localization processes, metrics and tools have evolved and standardized to the point where the languages may change, but in general the process stays the same. In the 1990s, localizers became aware of the challenges of “double-byte” languages and Asian scripts, but the advent of Unicode solved many of these issues. Bidirectional languages such as Arabic have long been known to present special difficulties, but those peculiarities aside, the localization process is seen as a template to be applied to any language.

The rise of emerging markets is bringing new languages into the spotlight and readers of this magazine are regularly introduced to the idiosyncrasies of less-common languages. Instead of focusing on the unique characteristics of each language, a greater understanding is achieved by understanding that all emerging market languages have common characteristics, and that there is a continuum of development as resources, tools and processes adapt to the needs of localizers.

The definition of an emerging market is vague, with the International Monetary Fund, The World Bank, stock markets and other organizations maintaining different lists. For translation, the focus is primarily on Asian and Eastern European languages, since South America and Africa are at least partially covered by familiar languages such as Spanish or French for many commercial purposes.

The impact of emerging markets on global economics is twofold: The massive buying power of the populations of China, India, Indonesia and elsewhere is being unlocked through localization. More importantly, the emergence of these economies seems set to continue and new countries are opening up. The repeated boom and bust cycles of the past in Africa and Latin America have been replaced by sustained growth in Asia and Eastern Europe.

At first glance, these two markets couldn’t be more different. The political systems and dominant religions are extremely diverse, and populations vary enormously. On closer inspection, certain common characteristics in terms of localization can be set forth.

For example, non-Latin scripts dominate. Lack of technical language support, line-break issues, segmentation difficulties, expansion and font size problems, and incompatibilities are very common. Computer-aided translation (CAT) tool support is missing or incomplete. Even for more established languages such as Thai, there is no support for using Thai as a source. Even as we move to the Indic languages and the soon-to-emerge languages such as Burmese and Khmer, there is poor support for them by CAT tools. Unicode exists, but is not necessarily adopted. For many scripts, Unicode has not been fully embraced either because the input method is new or unpopular, or because there is a lack of fonts. Simply requesting that the localizer “use a Unicode font” does not solve font or input method issues.

Also, the freelance translators for these regions often don’t use CAT tools. Trados freelance edition costs what is several months’ wages for an educated Indian office worker, and with the benefits of CAT tools accruing largely to the client, the flat model of CAT tool pricing fails in emerging markets. True, Wordfast offers a special price for most emerging markets, but in general CAT tool pricing is not realistic when compared with a translator’s earnings there. Add to this the fact that the less common languages naturally have far fewer professional translators and many of them work regular jobs. There tends to be a lack of translator associations and accreditations and even translation degrees may begin with English 101 and assume no working ability in English.

There is often inconsistent terminology and a lack of standards. Even basic IT terminology can be wildly inconsistent in emerging markets. As an example, Microsoft tends to eschew loanwords and transliterations in favor of pure translations with the conviction that where they lead, others will follow. On the other hand, Oracle tends to survey what is being actually used by the minority of the educated population who use computers every day, so their glossaries are full of loanwords from English and transliterations. Both approaches are valid, but the outcome is that even the most basic IT terms may be translated differently across companies and end-users. When this is compounded by issues around dialect (such as North versus South Vietnamese) glossaries tend to be much more problematic in emerging markets.

For these reasons and more, standard metrics for quality and productivity may not apply. What is the acceptable number of spelling errors per 1,000 words? One answer generally cannot be applied for all languages. With Thai, for example, having three times as many letters as English and with its spellcheckers unable to work efficiently, does it really make sense to have one metric for all languages? It’s the same for productivity. Every new project manager learns that translators do 2,000 to 2,500 words per day, and editors do 5,000-6,000 words. But in a country with immature, inconsistent terminology, lack of experience with CAT tools, and a prevalence of part-time translators, applying the same metric does not make sense.

Internet connectivity and affordability is also lower. With files getting larger and cloud-based CAT tools becoming common, it is worth reflecting that users in emerging market countries often have the double whammy of slow speed and high cost. In China, internal bandwidth is fast, but the international firewall can retard connections to overseas servers.

Cultural and religious

sensitivities

These are practical localization issues that emerging market languages face, but there are also increased cultural/religious sensitivities. Almost all the emerging markets score badly on the Press Freedom Index, and are a minefield for companies that assume that the same approach that worked fine in Europe, followed by Japan and Korea, will apply elsewhere.

Consider, for example, that YouTube was banned in Thailand several times in recent years because videos deemed offensive to the country’s revered monarch were posted online. The Thai government demanded they be removed before they made the service accessible again.

Religion is also a factor. In 2013, Malaysian authorities banned a planned concert by the American pop singer Kesha after deciding it would hurt cultural and religious sensitivities due to explicit references to sex and liquor in the lyrics. In 2012, a British woman who lived in Dubai was jailed for three months for inappropriate behavior with her boyfriend.

Racial profiles are also different. In 2009, Microsoft in Poland switched a black man in a photo for a white one. This was quickly discovered and caused a minor scandal when the picture, showing employees sitting around a desk, appeared unaltered on the firm’s US website.

After nearly 20 years of working in emerging market languages, I am no longer surprised by requests to supply translators who can use a certain CAT tool with predefined experience, qualifications and metrics.

Localization professionals in developed countries talk about certain emerging market languages as being problematic. Actually, it’s not that emerging market languages are problematic — it’s the assumption that all languages can be localized with the same process that is fundamentally flawed.

Rather than assuming that if a process works for the first 15 languages, new languages should be shoehorned into the same process, localizers need to address the specific issues of emerging market languages, starting with an assumption that there will be technical and practical difficulties in these languages.

Recommendations

Testing linguists is more important than evaluating résumés and professional qualifications. Most emerging market translators have picked up their skills in real world on-the-job training and practice rather than by formal education. With freelance translation often paying much better than typical office jobs, the incentive to “game” the system is much higher. Many candidates have padded or outright fake résumés. Successful translators can outsource their work to a lower cost translator, or they may take on more work than they can finish. Staged deliveries and sample checking can be used to detect substandard work at the beginning of a project rather than at the end.

Schedules and metrics need to be adapted for increased setup time and lower productivity. Terminology issues are inversely proportional to economic development. It’s worthwhile to allow extra time for recruitment, testing, glossary and training in CAT tools and quality assurance tools. While forcing emerging market languages onto the same track as other languages may be possible, costs are higher and there are greater risks to quality and deadlines.

By the same token, quality metrics and expectations may need to be adapted. With a limited resource pool, nonstandardized terminology and issues around non-Latin scripts, emerging market languages have more problems with quality. In a rapidly developing economy, the language also changes between generations faster than in English. A male product manager in his fifties may have different ideas on what is correct formal language compared to a female translator in her late twenties.

The choice of CAT tool should also be based on real world needs, not the existing model for mainstream European and Asian languages. Enforcing Trados usage and demanding Trados formats as a deliverable disqualifies the vast majority of translators in emerging markets. Instead of demanding compliance to a specific tool, it’s much better to be prepared to supply the tool to linguists, as well as train them how to use it. A part-time translator earning $0.03 per word may be well-qualified, but he or she certainly can’t afford to pay out $825 for a Trados Freelance license.

Thus, applying a localization process model developed for mainstream languages as a template for emerging market languages is likely to cause failure and frustration. Consult specialists in the planning stage rather than presenting a schedule and metrics as a fait accompli, because these will be accepted and then broken.

As emerging markets become economically viable for localization, they move along a continuum of practical and technological maturity. Languages that are lower on the totem pole of commercial viability and furthest in both geographical and linguistic distance from Europe and North America will inevitably have proportionately greater challenges.