Korean introduces new challenges to localization

By Hyelim Chang June 29, 2014

Korean is spoken by over 75 million people worldwide, just over 1% of the total world population. This makes it one of the top 20 languages by number of native speakers around the globe. However, when we enter the realm of the web, its presence is even greater. Korea is a highly wired country, with 40 million internet users, and jumps way up on the list to tenth among languages used on the web.

The Republic of Korea is also a leader in electronics and telecommunications. It is home to global brands such as Samsung, Hyundai and LG. The country is well known as a test-bed for these kinds of companies, with a large community of early adapters hungry for anything new and high tech, and eager to try out new technology, be it hardware or software. Its presence as a cultural power is also influenced by Korean-made movies, drama and games that attract fans from overseas.

These factors would likely make you think that localization is a widespread, well-developed industry in Korea — a been-there-done-that kind of field, with trials made and errors fixed as Korean-made goods and services swarm the world and global businesses target eager consumers in the tech-hungry country.

Quite to the contrary, localization is still relatively unknown and underdeveloped. Not a single major localization conference has taken place in the country, for example. Ask a professional translator whether they have heard of “localization,” and only few would answer yes. Localization is still not mentioned in major translation schools, let alone included in the curriculum. This is in stark contrast to many education institutions in other countries with localization courses and degrees.

Since Google set up a localization team in Seoul, we have been making conscious efforts in the market, raising awareness of localization to users, attracting talented resources to the fascinating world of localization, and tackling unique challenges we face when localizing into Korean.

A lonely language

Korean is a loner. It is often classified as a language isolate, meaning it does not share a common ancestor with another language. A few consider it, somewhat controversially, to be part of the Altaic language family of central Eurasia. Many see Korean as part of the CJK (Chinese, Japanese, Korean) group and assume the three are closely related linguistically. Korean was indeed heavily influenced by Chinese culture, and this can be seen in the lexicons of the language. Roughly 57% of Korean vocabulary is Sino-Korean and derived from Chinese. However, the two languages are completely different in structure. Chinese has an SVO (subject-verb-object) word order, while Korean has SOV. On the other hand, Korean and Japanese share SOV word order but show apparent distance in phonology. Korean tends to follow the rule of vowel harmony and has a complex vowel system of ten vowels and 11 compound vowels, while Japanese has no vowel harmony and is not particularly rich in vowels. Many linguists argue that the similarities resulted from frequent and intense language contact, and that the two have no genetic relation.

So what makes Korean so different that it does not have any brothers or sisters among the 7,000 languages spoken around the world? A variety of things, but postpositions (or particles) would arguably be the single most unique grammatical structure for many foreign language learners. Some twenty different particles are attached to words to indicate their grammatical roles in a sentence. In English, whether the role of a noun is a subject or an object is mainly indicated by its position in the sentence:

Anna called Billy.

Billy called Anna.

In Korean, whether the noun is a subject or an object is indicated by the particles that follow the nouns, and otherwise the two sentences are identical:

Anna가 Billy를 ∫“렀다.

Anna를 Billy가 ∫“렀다.

Anna is the subject of the sentence when her name is followed by the subject case marker 가(ka), as in the first sentence, and it is an object when it carries an object case marker 를(reul) in the second sentence.

What adds to the complexity — on top of the mere existence of the particles, as they are a totally unknown form of grammar in English and many other languages — is that different particle forms are used for nouns ending in consonants and nouns ending in vowels. For example, 을(eul) is an object case marker for consonant-ending nouns, and the aforementioned 를(reul) is for vowel-ending nouns:

집(jip)을 사다.

자동차(jadongcha)를 사다.

The first phrase shows that 을(eul) is used, because 집(jip), meaning house, ends with a consonant. The second phrase uses 를(reul), as 자동차(jadongcha), meaning car, ends with a vowel.

This would no doubt give headaches to learners of Korean as a foreign language. But what does it have to do with challenges in localization? It becomes a problem when we deal with messages with placeholders, for example. Many software localization projects use placeholders to avoid having to create duplicate messages, which saves time and money and achieves consistent quality.

Imagine if we did not have the option to use placeholders. Google would have to create a separate “Welcome to XXX” message for more than a hundred products we offer. So placeholders are a blessing. But it can quickly become a nightmare for Korean localizers. With so many different nouns that can replace “XXX,” you cannot cross your fingers and pray that they all end with either a consonant or a vowel. We need to come up with a way to make it work for both cases.

Ever-increasing personalized messages add to the challenge. Messages such as “XXX likes the photo,” common in social networking services, are now found in a variety of other services. Grammatical particles need to follow the names to make a proper sentence in Korean — and we have no idea what strange names are out there!

The Korean localization team at Google takes on these cases in different ways. Exceptional particles can be used for both consonant and vowel-ending nouns. “Welcome to XXX” is translated using a locative particle 에(e), which can take both consonant and vowel-ending nouns. The result is “XXX에 오신 것을 환영«’니다,” which is literally “XXX to welcome.” But this is not always feasible. For username placeholders, we add an honorific suffix, ¥‘(nim), making each name into a consonant-ending noun. “XXX likes the photo” is translated to “XXX¥‘이 사진을 좋아«’니다,” which literally reads “XXX nim photo likes.” This enables us to safely use particles for consonant-ending nouns and also gives the impression that Google is a polite person — or a polite company, rather. But in many other cases, we resort to adding both case markers, as in “XXX (을)를 찾을 수 없습니다,” which means “Cannot find XXX.” Here, both 을(eul) and 를(reul) are added with 을 in brackets to indicate that either one of them would be used in the sentence.

Translators have a lot to say when it comes to localizing messages with placeholders. Korean has a variety of counters, and you need to know what you are counting to choose from them: 개(gae) for counting search results, 명(myung) for counting people, 번(bun) for counting views, 권(kwon) for the number of ebooks, 곡(gok) for music and so on. “X out of Y,” and “X to Y” are some of the messages that can be challenging to localize if you do not have information on what you are counting or if the message is used in multiple places. The ideal solution is to find out what is being counted and use the appropriate counter, but as a workaround we sometimes use neutral translations such as “X/Y” or “X – Y.”

Friendly, but not rude

Korean is an honorific language with extremely systematic grammar to indicate the social relationship between the speaker, addressee and the subject. Special honorific nouns and verbs are used to show respect to the subject of the sentence. Seven different speech levels have their own set of verb endings to indicate the relationship of the speaker to the addressee.

It is even said that in Korean, a sentence cannot be uttered without the speaker’s knowledge of his or her social relationship to the addressee, considering social status, age, kinship, familiarity and so on. Otherwise, the utterance may sound rude, inappropriate or awkward. Picking the correct speech level is far from simple. Of the seven levels, hasoseo-che, one of the two higher levels, is no longer used in daily conversation. Hage-che and hao-che are merging with haeyo-che, all middle level. So contemporary Korean mainly uses four levels: hapsyo-che, a formal style; haeyo-che, semi-formal; and haera-che and hae-che, both informal.

In Figure 1, different verb endings are used after 로그인 (log in) for each of the speech levels. Choosing the correct speech level can be a tricky business. Google aims to be friendly to our users, but not to the extent that we come out sounding rude. The lowest two levels are used when conversing with close friends or to those lower in social status, leaving Google with hapsyo-che and haeyo-che as options. We used to translate in the formal hapsyo-che for ad products and the semi-formal haeyo-che for consumer products. But as more people perceive hapsyo-che as being an older style, we now use the middle speech level for all our products — although there are exceptional cases such as legal documents. This is part of our constant effort to be closer to users, but not too close to be deemed rude.

Hangeul: a syllabic writing system

Although lonesome in the world of languages, Korean has a very good friend which came into being with the sole purpose of supporting it. Hangeul is unique in that it was a deliberate invention to provide a way to write the Korean language. The script was invented in 1443 and is regarded by linguists as the most scientific writing system in the world. It can express as many as 8,778 sounds with ten vowels, while maintaining the strict one-to-one correspondence between a written syllable and sound. The shapes of the characters represent the way the mouth, tongue and lips form the sounds.

Hangeul is different from most writing systems in that it is half-alphabetic and half-syllabic. It is alphabetic in that one letter corresponds to one sound. This way of writing a word coincides with the convention of linear writing in English. However, the letters are not written linearly as in English. Instead, they are grouped into syllabic blocks of two to three letters — initial, medial and final — which makes the writing nonlinear. These blocks are arranged horizontally from left to right or vertically from top to bottom. To write Hangeul, for example, you would not write h-a-n-g-e-u-l, you would write in blocks: han-geul, 한(han) and 글(geul). As seen in Figure 2, although 한 looks like one letter, it is actually composed of three letters: ㅎ,ㅏ and ㄴ, each representing the sound h, a and n. In 글, the letters ㄱ, ㅡ and ㄹ represent the sounds g, eu and l respectively.

The ingenuity of the script earns praise from linguists but can be the cause of headaches for people in the localization industry. The nonlinear writing system can be problematic with text wrap. This is a challenge that is shared with Chinese and Japanese. In English, when a word is too long to fit into the remaining space of a given line, it is shifted to the next line or hyphenated. However, in Korean, a word can be incorrectly chopped up if multiple syllable blocks form a single word. Manually adding linebreaks doesn’t always fix this, as different devices have different screen sizes.

Figure 3 shows a typical linebreak error in Korean. The first sentence, “새 Google 지도를 만나보세요,” which means “Meet the new Google Maps,” has the last word, 만나보세요, split in two, with 만 on the first line and 나보세요 on the following line.

Korean’s unique look and feel, although beautiful, means fonts that are fine in English can look ugly in Korean. Large font sizes can make words look as if they were written by a kindergartener. Figure 4 is from the Project 10¹⁰⁰, where Google, celebrating its tenth birthday, asked for ideas to change the world and promised to fund the best ideas.

Italics also don’t work, as they tilt the square-looking blocks. At Google, we pay extra attention so translation is not lost in these font issues. We have specific guidelines on which font type and size to use. We ask translators to remove italics tags if possible, check for linebreak errors and inappropriate font. As clothes make the man, font makes the text.

Figure 5 shows part of a Help Center article for Google Accounts. Part of the second-to-last line is in italics, which make the Korean script look out of place. We solved this by removing the italics tags from before and after the sentence.

True localization

Localization is key to attracting international users, so any global business aiming to succeed in international markets would need to get it right. For us it means even more. This is our way of making information universally accessible to everyone, independent of which language they speak, which is part of Google’s stated mission.

Korean consumers have very high standards. They know what they want and they are used to getting it. Korean businesses are quick to meet their needs and agile in adopting new trends. Korea is well known for its 24 hour delivery food and 24/7 customer service. Hungry at 3 a.m.? You can choose from a whole range of foods — from hamburgers, pizza and fried chicken to rice, sushi and pasta — delivered right to your doorstep in no time. Your computer crashed? Just call the manufacturer hotline and a repairman will be ringing your doorbell even on weekends. There is even an “errand service” where you can ask for anything — pick up the dress you need for a party from the laundry by a specified time and bring it to your office, or have someone stand in line for you at a must-eat restaurant that doesn’t take reservations. To attract such demanding consumers, investing in localization is essential.

Google has a dedicated team to provide quality localization to users in non-English speaking markets. And Korea, with its leading global brands and tech-savvy users, is a strategic market for Google. We are constantly working to identify and tackle the unique challenges we face in localizing into Korean. We are lucky to have demanding users in Korea, who will not be satisfied with anything below expectations, to push us to be even better.