Localizing technologies for Swiss German speakers

Swiss German consists of multiple spoken dialects that are used in the German-speaking regions of Switzerland. Spoken by approximately 65% of the population, Swiss German is the general and common language used for everyday communications.

Swiss Standard German is the variety of High German used for both formal and written interactions seen in professional communication, newspapers, TV news, legal documentation, etc. It is also the variety of German taught at school and in non-German-speaking regions of Switzerland.

Since Swiss German dialects are spoken dialects, they are transcribed into written forms. There is no standard orthography for Swiss German, therefore people write using German characters that they think best represent the sound of the dialectal pronunciation. This can be seen when Swiss German speakers write messages, posts, or comments online.

However, since word pronunciations differ from one canton to another, their written spellings also differ, such as the phonetic distinction between ‘e’ and ‘ä’. For example, the verb ‘aufstellen’ is written ‘ufbouä’ in Bern and ‘ufbaue’ in Zurich, Luzern, and Basel dialects.

Swiss German dialects in technologies

Swiss people use their dialectal variant in their daily communications with friends and relatives, and also in a professional capacity with colleagues, customers, partnerships, etc. But quotidian tools such as spelling correctors, predictive text, and even automatic translators are useful tools that simply do not exist for Swiss German. 

Being considered namely a spoken language, academic research has primarily focused on the development of pronunciation and speech-related language resources and technologies. However, the lack of large enough written resources diminishes the opportunities for developing advanced natural language processing technologies and tools, such as parsing, word sense disambiguation, summarization, machine translation, and bots. 

Building Swiss German lexical data for natural language processing (NLP)

At Oxford Languages, we have created a large Swiss German lexical dataset that focuses on presenting Swiss Standard German and the Swiss German dialects spoken in Bern, Basel, Zurich, and Luzern.

This is a unique Swiss German resource that enables and unlocks the development of specialized and customized natural language tools. Our lexical data goes beyond the extraction of token counts from corpora, as we have added morphological information and parallel human translations, which enables further research and development of complex natural language tools, adaptation, and conversion of existing Standard German tools to Swiss German, and the localization of existing resources. 

At the same time, our lexical data can be used to leverage state-of-the-art High German resources and localize them to Swiss German equivalents by using knowledge transfer techniques, and to build up resources that combine German varieties to enable cross-dialectal usage. For instance, information extraction, automatic localization of documents, and even cross-dialectal communication with bots. 

Cognizant of the fact that Swiss German is written according to the dialectal pronunciation, our lexical data can also support supplementary study, progress, and maturing of speech resources and technologies.

Learn more about Oxford Languages at https://languages.oup.com/

Meritxell González
Meritxell holds a PhD in Computer Science in the field of Natural Language Processing. She works as a Language Engineer at Oxford University Press, where she applies her large experience in various Computational Linguistics fields to leverage the vast amount of data at Oxford Languages and to create new and enhanced rich linguistic resources.


Weekly Digest

Subscribe to stay updated

MultiLingual Media LLC