Who sells chicken eggs: Cracking Thai localization

It is only appropriate for me to offer my apology early on, if many of the things discussed here don’t make much sense to you. The title is one of those many things, I suppose — and that is why Thai is such a perfect language for adventurous, fun-loving people. Foreigners beginning to learn Thai may know what this sentence means. It is a challenge for foreigners to speak these words, ใคร ขาย ไข่ ไก่, roughly translated as “who sells chicken eggs?” (kh(r)ai-khai-khai-kai). I am going to skip the International Phonetic Alphabet to leave some desirable confusion in this article. This question may not be a very intelligent thing to ask, but it well represents how tonal Thai is: one same word, but with different tones that are almost always considered different words with completely different meanings. 

Therefore, a simple mispronounced tone could take you from the word “cool!” (Cheng loey — เจ๋งเลย — with cheng in rising tone) to the almost opposite idea summed up in the phrase “what a complete mess!” or in some contexts “going bankrupt” (Cheng loey — เจ๊งเลย — with cheng in high tone). The only difference in these two words (เจ๋ง vs. เจ๊ง) is the tonal diacritics on top of the words. Visitors to Thailand are often taught the sentence คุณสวยมาก (Khun suay maak, with suay in rising tone) and the new students would invariably say คุณซวยมาก (Khun suay maak, with suay in mid tone). The first means “You are so beautiful,” while the latter means “You are so jinxed.” The only difference in this example is the base consonant of the word suay (ส vs ซ). The interesting part is the fact that the tonal marks are not the only things that govern tones in Thai. The base consonants have a say, too. 

I was walking by a four-star hotel in Singapore’s Bugis Junction and something caught my eye. It was a door. But what made it so special were the words in different languages on it: hello, ciao, hola and so on. The hotel was making its international welcome apparent even on the service door. Nice touch, I must say. At this point my patriotism kicked in and I started looking for something in Thai, which in recent years has climbed up to the first 28-language tier in many localization efforts. And there it was, a Thai word, right in the middle of the multilingual door.

I was overwhelmingly proud, partly because it was in the middle, and partly because the word could be translated as material(s), appearing among words in other languages that (as far as I know) say “Hello.”

The thing is, the famous word for hello, Sawasdee, has not been replaced by the one meaning material(s) yet. Not that I know of. So this must be a mistake. Of course it wasn’t completely the hotel’s fault. It was probably the small window cutout in the door that got the word trimmed which, to be fair, made it the hotel’s fault still. But the point is, Thai words are like earthworms. You cut one in half and you may get two wriggling words that live happily for the rest of their lives. (This paragraph contains disturbing descriptions; readers’ discretion is advised.) In Example 1, what is almost a full word for material(s) lives in the word hello.  

Please allow me to give a classic if not extreme example: นวลลออมองยลมวลภมรลอยวนบนดอกบอน. This is a complete sentence made up entirely of consonants and no vowels (It’s different from the one I usually cite: กรกนกชอบลองขนมอบกรอบ). It’s a ten-word sentence conveying the idea that a pretty girl looks and admires all the insects that fly around above flower(s). In this original meaning, the words are broken down like this, with lines between the words for your convenience:

นวลลออ|มอง|ยล|มวล|ภมร|ลอย|วน|บน|ดอก|บอน. For your inconvenience, however, the letters could be regrouped to form an entirely different set of words:

นวลล|ออม|องย|ลม|วลภมร|ลอ|ยวน|บนด|อก|บอน.

In this grouping, the sentence is about wind, speaking rubbish and breasts. It will be obvious from this example that Thais don’t punctuate the way the rest of the world does. We do it with imaginary marks that are not very obvious to nonnative Thais. And we don’t bother separating words with spaces — which we think are a waste of bytes, because duh, they are blanks anyway. This is why Thais are great spies. It means text just runs freely like cheap stockings. If a five-paragraph sentence can be theoretically achieved, it’s got to be Thai.  So the morals of the story are:

Making a line break in Thai text is challenging even for a Thai native speaker, since it relies pretty much on one’s understanding of the context.  

Making a mistake in line break could give you a very nasty earthworm. And since Thai does not show plurality, it would very well be several nasty earthworms for you. I can assure you the “material” in “hello” is one of the more proper ones. 

Translation from a single Thai original can vary greatly from one human translator to another. It can be highly subjective to identify sentences and ideas.

Back to the “material(s)” in “hello.” Even without considering that it was a small cutout of the real word, the word was still wrong on that door. Probably someone copied the word สวัสดี only partially and left the last little character out. That’s another challenge for a non-native speaker working with Thai. Thai characters occupy four hierarchical levels in a line, including the main line (like what the English alphabet occupies), two upper levels and one lower level as illustrated in Example 2. 

These four words consist of 17 characters, only 11 of which are base characters. The rest are some vowels and tonal diacritics. In some applications, where pixels are so expensive and the number of characters is strictly counted, Thai words are often found to be too long even though the string length is just fine. In some localization tools, the string fails the character count; in others, it fails the pixel height restriction.

It is also important to know that the upper and lower characters are dependents. They cling to their base-character hosts. Most desktop publishing mistakes in Thai happen when you copy just the bases and leave the small bits out, or insert line breaks between a base and these dependents. Sometimes it just shows that you don’t know Thai, but too often you would create, yeah, additional mutated earthworms. 

Sometimes the combination of font and program just does not work for Thai. When that happens, the top-level characters can disappear from a string of text. For example, if a heading was supposed to say “installation” (การติดตั้ง), with just one character missing, this heading now colloquially will say “To owe (someone) money” (การติดตัง). Hence, the change could be quite drastic for a case of just one diacritic gone missing.

Speaking of good characters in a bad place (story of my life), it’s also good to know that the input of certain Thai characters needs to follow a very specific order. You may say, “Of course it has to be in a specific order, we do that all the time in other languages, too, dude.” Well, that is true, but not many languages have tiny characters stacking up or hanging down in four levels, either. Some characters are always in front of their base consonants, while some others stay behind all the time. Believe me, a lot of Thai people still cannot get it right. 

It only gets more complicated when you have an upper vowel plus a tonal mark, and both are floaters. In the illustrated text in Example 3 you have to type the orange vowel followed by the blue tonal mark. However, in the illustrated text in Example 4 you will have to type the black tonal mark (the same character as the blue one) before the red vowel. Yeah, things are clearly very consistent in Thai. 

Now, some applications are smart enough to correct such a bad behavior for you. But the input order is still treated inconsistently among different programs. It is therefore advisable to be very careful when you move the cursor. Sometimes it won’t show on screen, but you will have already moved past something. Or you inadvertently, ever so lightly, press a key and nothing appears to be changed. However, in some cases you may very well end up with corrupted blocks over your text such as in Example 5. A simple rule is that no upper-level character can lead a word, a sentence or a line.   

Well, I think I have confused you enough! Mind you, I still have the challenges in transliteration, plurality, pronouns and many more in my pocket. They could make for great sequels. But these are the main things that Thai localizers and publishers face every day. And they should give you some precautions in handling Thai language and her siblings in the region.