Creating a language lexicon for Asian languages


Jordi Torras
MultiLingual Oct/Nov 2016
Core Focus

Creating lexicons for Asian languages (Chinese, Japanese and Korean) was no small feat, as they required complex systems including different written styles of characters and extensive grammatical structures to express politeness and formality.

The Japanese language lexicon was particularly tough to pair with NLP applications because there are four different writing systems in the language; all can be used together and interchangeably. The Chinese lexicon was designed to simultaneously support traditional and simplified Chinese writing systems, which allows the same semantic technology to be used in mainland China, Hong Kong, Macau, Taiwan and overseas Chinese communities. And the Korean lexicon was written almost entirely in Hangul characters, which is not written in sequential order....

In Japanese and in Korean, words don't have either a gender or a number. This type of information is optional and will be added using affixes. In both languages, demonstratives and determinants have several degrees, as is shown in Table 1.

Japanese verbs have three main tenses: past, non-past (which includes present and future) and continuous present (-ing verbs in English). Korean has a more complex and rich verbal system that makes explicit differences between future, conditional, near future, far future and so on.

Unlike artificially created languages like computer programming languages, natural language gives us the ability to understand, process and utilize the everyday semantics that we communicate with. Through the creation of these complex lexicons, businesses can now understand the meaning behind the questions asked by their Japanese, Chinese and Korean speaking customers.

Asian languages have already made a significant impact on business and culture worldwide and will continue to exert increasingly more influence into the foreseeable future. For businesses large and small interested in competing in a global, multilingual economy, it’s imperative that they not only understand the differences among natural languages and between natural and formal languages, but that they can also leverage language nuances and subtleties to refine the online user experience in a meaningful and profitable way....