CEE languages: Challenges and opportunities

Central and Eastern Europe (CEE) is a region of incredible linguistic diversity. In an annual examination of online language markets, Common Sense Advisory (CSA Research) identified 19 languages in the region with significant online economic activity. These languages are (in order of their economic potential in 2017): Russian, Polish, Czech, Hungarian, Romanian, Slovak, Croatian, Lithuanian, Slovenian, Ukrainian, Serbian, Bulgarian, Albanian, Latvian, Belarusian, Estonian, Georgian, Armenian and Macedonian. If we include the languages of the former Soviet Union in Western Asia, three more appear on our list: Kazakh, Azerbaijani and Turkmen. 

These languages hail from three major families (Indo-European, Finno-Ugric and Turkic) and use two major writing systems (Roman and Cyrillic, as seen in Figure 1). In addition, they appear in several very different political environments, ranging from the liberal market economies of the European Union (EU) to more tightly controlled countries in some of the former Soviet Republics.

CEE countries using this extended list of languages accounted for 6.5% of global online accessible gross domestic product (what we call “eGDP”) in 2017, roughly on par with the economic opportunity of German. If we were to treat these CEE languages as a group, it would rank higher than any language except for English, Simplified Chinese, Japanese, and Spanish. However, of these CEE tongues, only Russian appears in the top tier of online languages identified by CSA Research. But their diversity of linguistic heritage, writing system, history and politics fragment the region, thus complicating any attempt to create a regional language strategy. None of the clusters such as West Slavic (Czech, Polish and Slovak) provide the economic clout associated with the classic French, Italian, German and Spanish (FIGS) approach to the “big” languages of Western Europe.

Figure 2 examines the top languages in the expanded Central and Eastern Europe region. Our data shows that the CEE share of the total world eGDP will decline over the coming decade even while the local economies will grow in absolute terms. Two of the region’s languages — Kazakh and Romanian — will grow in economic importance as they move into Tier 2 of CSA Research’s list of top online language markets. Other regions not shown in Figure 2 — particularly South and Southeast Asia — will exhibit much more substantial increases over the same period and power up the rankings. Because our analysis of languages is based on cumulative global GDP, the growing strength of those Asian economies obscures the less rapidly growing CEE countries.

However, these figures do not tell the whole story. CSA Research finds that website support for some of these languages is disproportionate to their economic potential. We identified five factors that corporations consider when deciding whether to support Central and Eastern European languages on their websites:

•European Union membership has its benefits. Official languages of the European Union tend to get a bump up in localization coverage. In 2016, CSA Research revealed that Polish, Czech, Hungarian, Bulgarian, Slovak, Romanian and Estonian all appeared significantly more often on the websites of leading corporations than other languages with similar online audiences and economic potential. This factor is most pronounced for smaller EU languages in general.

•Localization decisions factor in political risk. CSA Research finds that political uncertainty and fragmentation of language communities across multiple countries lead to perennial underinvestment in some markets. Increasingly nationalist regimes in countries such as Poland and Hungary also raise a red flag for many international enterprises. These factors may lead to enterprises putting the brakes on investment into CEE languages. Current concerns in the United States will similarly slow down investment for localization into Russian.

•Unicode doesn’t solve everything. Even though Unicode has been available for decades, languages written in non-Roman scripts still receive a penalty in localization. Many companies still consider them to be more difficult than they are. This affects those written in Cyrillic script — Russian, Kazakh (which is officially transitioning to Latin script), Turkmen and Belarusian. Armenian and Georgian face an even stronger uphill battle due to the combination of a language-specific writing system and a small population of speakers. Similarly, many Central and European languages have complex grammars that blunt the effectiveness of language technologies such as machine translation.

•High piracy rates deter software developers. Some of the languages in the region suffer from rampant piracy — as high as 90% for Georgian — that deter investment. Unfortunately, lack of localization and piracy reinforce each other, creating a negative cycle that keeps some languages trapped in an underinvested state.

•Familiarity helps. When executives decide which countries to go to, they do not always decide based on objective factors. Instead, family background, languages studied in college, or even where they have gone for holiday travels can loom large. Here again, languages from larger countries in Europe with prime tourist destinations are net beneficiaries, while smaller ones languish (Figure 3).

Although these factors help some languages in the region and keep others down, the region as a whole represents a solid market with long-term growth potential. The role of some of the region’s countries in the European Union and proximity to large economies make it a good expansion opportunity for enterprises already present in Europe.