Translating the Baltic languages

Is there a difference between the Baltic languages and the languages of the Baltic States? Is Estonian a Baltic language? Is Finnish a Scandinavian language? For linguists, these questions are clear and unambiguous — there are two Baltic languages, Latvian and Lithuanian, while Estonian and Finnish belong to the Finno-Ugric language group.

However, even though it may sound paradoxical, the language industry often does not stick to a strict linguistic classification, and languages are more freely attributed to one group or another. Usually, the concept of the Baltic languages extends to the concept of the languages of the Baltic countries, including Estonian, while the Finnish language is sometimes attributed to the Scandinavian languages. Thus, the regional factor prevails over the linguistic factor.

When talking about the Baltic States, we normally have in mind the three countries on the east coast of the Baltic Sea: Lithuania, Latvia and Estonia. The regional identity and sense of community of these countries evolved most prominently during the Soviet period. All three countries were occupied by the USSR and experienced similar hardships under occupation, such as mass deportations, repression and collectivization. However, during the Soviet period, their standard of living was similar and was somewhat higher than in the rest of the Soviet Union. All three countries attempted to preserve their national identity to the largest extent possible during that period. The regional identity of the Baltic countries became particularly strong in the 1990s when the countries attempted to restore their independence. This continued after independence had been regained, and they took their first steps as independent countries, thereby beginning the process of integrating into international organizations. Thus, the concept of the Baltic States was formed historically. Nowadays, the regional identity of the Baltic countries has been somewhat diminished because of the absence of a need to unite against a common enemy or to achieve common goals, and as a result, there is no longer the need for close cooperation among the states. In some cases we can even see competition emerging. Estonia increasingly identifies itself as a Nordic country due to its close economic and cultural ties with Finland. However, the concept of the Baltic States remains relevant, and usually no one with any understanding of the European map has any doubt as to which countries it covers. Therefore, most often, the Baltic languages mentioned in the lists of companies providing translation services imply the three Baltic languages Latvian, Lithuanian and Estonian.

However, there are only two living Baltic languages: Latvian and Lithuanian. This Baltic group also includes the dead languages Prussian, Yotvingian, Curonian, Semigallian and Selonian. Baltic languages belong to the Indo-European language family. They are of particular interest to both local and global linguists because of their archaism, especially with Lithuanian, and these languages have retained a large number of archaic language properties associated with the proto-Indo-European languages. The archaism of these languages is also one of the reasons for linguistic purism among Lithuanian and Latvian linguists, which, on the one hand, ensures a greater protection for the language and assures its ability to remain unchanged for a longer period of time, while on the other hand it also causes a significant headache for those working within the language industry because of the frequently changing terminology and other processes involved in language standardization.



While the major world languages can afford to turn a blind eye to borrowed or foreign speech structures in their languages — as, for example, the Russian language does by allowing the almost unlimited usage of English words in both spoken and written language — the same cannot be said for smaller languages such as the Baltic languages if they wish to preserve their purity. Additionally, it is not enough for the Lithuanian and Latvian languages to “defend” themselves from the English language, since they also contain the huge historical imprint of Slavic languages, mainly Russian.

Standard Latvian and Lithuanian only developed at the end of the nineteenth century and the beginning of the twentieth. As a result, there are still difficulties in their usage that are not inherent in older languages such as generic English, French or Russian. The major initial work on the standardization of the two Baltic languages was carried out at the beginning of the twentieth century, during a period of time when both countries were independent (1918-1940). However, in 1940, the work on standardization was interrupted when both countries were occupied by the Soviet Union, and until 1990/1991 (except for during the period of Nazi occupation 1941-1944) the languages experienced a great deal of Russification. The biggest loss during the Soviet period was that the Lithuanian and Latvian languages lost their status as official languages. The Russian language was considered to be the official language in both states. However, the use of local languages was, fortunately, not forbidden. They were used at home, in the mass media and by educational establishments.

It should be noted that while training in educational institutions was conducted in native languages, children began to be taught the Russian language as early as at kindergarten, and at secondary schools Russian-language education was given as much attention as the teaching of the mother tongue language. Public authorities and also major companies and institutions prepared their documentation and correspondence in Russian. In promoting the industrialization of the countries and the implementation of the Soviet plan for “mixing the nations,” a large number of personnel from Russia and other Soviet republics were sent to Lithuania, and even more to Latvia. As a result, the proportion of the ethnic Latvian population within the total population was diluted from 80% in 1935 to 52% in 1989. The newcomers spoke Russian and often had not the slightest intention of learning the local language, often considering themselves to be superior. Therefore, the situation developed where even though Russian speakers were often in a minority, communication took place in the Russian language.

It is natural therefore that Russian affected the local languages. The latter became populated with a number of Russian words, particularly in relation to the realities of the period, and irregular borrowed grammatical structures. The strong influence of Russian on the Baltic languages can also be explained by the fact that the Baltic languages and the Russian language have many similarities, including grammatical structure. The structure of Russian naturally soaked into the Latvian and Lithuanian languages as a result of this, and people, while speaking words in their native language, were increasingly using the grammatical structures of the foreign language — irrelevant declensions, sentence structures atypical to their language, literally translated phrases and so on.

This is especially true of the jargon of scientists and technical specialists. Most of the scientific and technical literature was written in Russian and in academic institutions or in discussions on these topics the terminology and structures of the foreign language increasingly prevailed. Therefore, when the countries became independent — Lithuania in 1990, Latvia in 1991 — and language standardization and management began, the greatest work that had to be done by linguists was in the management of technical terminology and the development of a variety of glossaries in technical fields, abandoning the borrowed Russian words and finding local equivalents for new concepts and the English terms usually inherent to them. Another difficult task was to abandon the clichés of language established over 50 years of occupation and to purify the language from foreign words, typical for almost every language user.


Language normalization

in the translation industry

Lithuania and Latvia have set up special bodies to conduct language standardization, terminology development and the supervision of language use. In Lithuania, the Lithuanian Language Commission and the Lithuanian Language Institute fulfill this role, and in Latvia this is done by the Terminology Commission of the Latvian Academy of Science and the State Language Center.

The work of language standardization is not an easy process, in particular because the creation of the new terminology and its inclusion into the language is not easily accepted by the language users. This is particularly true in terms of the replacement of improper borrowed words from foreign languages used in daily language with the newly created equivalents of the native language. Terms newly proposed by linguists are often rejected by consumers until they eventually blend into the natural language and become commonly used terms, or consumers refuse and reject them. Much more difficult, however, is the work of the creation and implementation of special terminology. Although linguists usually consult the opinion of specialists in various fields for this work, the professionals in question do not easily substitute terms already used and familiar to them. Professionals are likely to continue using their conventional jargon, especially for words borrowed from English. For this reason, professionals often level criticism at translators who allegedly perform their work improperly. Professionals even say they prefer reading specialist literature in English rather than the translated material. This is especially true for IT professionals who are very familiar with English terminology and often prefer to use nonlocalized software.

Thus, localization sometimes becomes a real challenge in being able to please both sides. On the one hand, we have the language standards, which are obligatory for the language service providers (LSPs), and on the other, we have the end users of translated texts. Companies are not inclined to easily modify the approved terms already in use because of a number of related disadvantages and the high cost involved.

The quality of Baltic language localization is determined by two things. The first is whether the glossary of terms was prepared properly using the terms approved by linguists. Often, companies increasingly rely on their local representatives in a certain country rather than localization professionals, considering their opinion to be more significant; hence, an LSP has to put considerable effort into client education and communication with a client’s local representatives. Unfortunately, LSPs are not always successful in doing this, especially where the terminology is already in use and the client is not willing to change it. Therefore, it is sometimes necessary to accommodate the client’s wishes and use the wrong terms.

Linguists and terminologists are another headache for LSPs. Philologists are pure linguists and do not have a deeper understanding of the impact of their work on the translation and localization industry or are even completely unaware about the existence of such an industry. Often, they readily replace the older, already-approved term with another. Thus, a situation arises where the client presents the LSP with a translation memory (TM) consolidated from translations made at different periods of time and does not require changes to be made to full matches. In such a TM, the same term can be translated in different ways, and each of them has been correct at a specific time. For example, the translation of the word scanner into the Lithuanian language has changed several times in the past six to seven years. Skaneris, skeneris and nuskaitytuvas were used until skeneris and skaitytuvas were finally opted for. Another example is the Lithuanian computer terminology dictionary issued in 2003 by Valerijonas Žalkauskas, which, as instructed by the Lithuanian Language Commission, had become the Bible of all translators and users of IT terminology — until 2005, when the Encyclopaedic Dictionary of Computing was issued and declared a “new Bible.”

Thus, localization companies often need to find a compromise between their intention to provide their clients with linguistically correct, high-quality products and the need to conform to client preferences.

The second aspect in determining the quality of localization is not only the adequate translation skills, but also the proper linguistic editing of translated texts. Paradoxically, the majority of the language industry professionals do not speak their native language well enough to avoid making stylistic, grammatical and other errors. Therefore, all texts must be reviewed by pure language editors specializing in Lithuanian/Latvian language philology. As in all languages, the translation of specialized texts is usually entrusted to translators specializing in the relevant areas, and usually those with a certain background education: medical specialists, IT specialists, engineering experts and others. While they are proficient in their field of expertise, the source language and the specialist terminology of the native language, they may lack deeper understanding of the language sciences. This is because these language structures seem correct for a common language user, as they have been entrenched in the language over an extensive period of time as a result of foreign language influence. The task of the language editor is to fix such errors.

Clients, however, are also not always positive about the work of LSPs. The well-groomed and proper language sometimes seems unnatural to the client, who is used to more colloquial structures. Therefore, LSPs are often faced with situations where the client remains dissatisfied with the translated text. Local reviewers contracted by the client may make a mass of “corrections” in translated and edited texts, thus contaminating them with many different types of errors. LSPs therefore often have to go considerable lengths in explaining the language rules and write lengthy replications for each correction.



of Lithuanian translation

Lithuanian, like Latvian, uses a modified Latin alphabet. The Lithuanian alphabet consists of 32 letters: 23 of them are unaltered Latin letters (except for Q, W and X), and the rest are complemented by a variety of diacritic marks. The Lithuanian language has preserved its inherent ancient forms, so in terms of archaism, it is equivalent to Latin and Ancient Greek. Lithuanian grammatical forms are similar to the ancient Indo-European language forms, and in some cases are even more archaic. The Latvian language is similar in that it also has archaic forms, but in comparison with the Lithuanian language is much more modern.

Lithuanian language nominals have grammatical categories of case, gender and number. There are seven cases, two genders and two numbers. Problems may arise for the management of grammatical cases in the Lithuanian language during translation with CAT tools, since identical source phrases are sometimes translated differently. This is especially true for the translations of lists, as their translation depends on the opening sentence (Table 1).

In replacing the opening sentence, the structure of the listed sentences changes as well. In the first example, the opening sentence requires the accusative, whereas in the second the nominative declension is required. Another similar example is found in Table 2. The first opening sentence requires the dative case to be used, while the second requires the nominative to be used. It should be noted that it is not only the nouns but also adjectives and all other inflective words that must be changed.

As we can see, translation with CAT tools poses difficulties. On the one hand, to ensure the consistency of the TM, identical source sentences should be translated into equally identical target language sentences, but in situations like this, it is not always possible to artificially reformulate the opening sentence so that the following listed items remain the same. On the other hand, the majority of clients require the reviews of full matches to be skipped. They do not pay for their editing. Therefore, TM segments automatically inserted from the memory may be grammatically or stylistically incorrect due to the differences in the opening statement. In refusing the review of 100% matches, the client risks compromising the text.

In addition, an even greater diversity in the translation of fuzzy matches or 100% matches occurs when there is a numeral before the noun. This is because in Lithuanian, the numerals require different forms of the noun, which depend not only on the number, but also on the declension and gender. Moreover, as CAT tools often treat numbers as variables, these do not show up in the source segment. Therefore, even when the source segments are identical, their translations into the Lithuanian language may be different, depending on the numeral. Thus, the translation of such sentences using CAT tools into Lithuanian requires extraordinary alertness. In English, numerals may require only two forms of nouns — singular or plural. For example, 1 degree, 2, 3, 4 and 1.5 degrees. In Lithuanian, there are many more noun forms required by numerals; for example, 1 laipsnis, 2-9 laipsniai, 10 laipsnių, 21 laipsnis, 22-29 laipsniai, 30 laipsnių and 1,5 laipsnio. Moreover, nominals are inflective, meaning the nominal endings change depending not only on the numeral, but also on the declension.

In some cases, to avoid the need to inflect nouns depending on the numeral and the declension, nouns are abbreviated if possible. Such a solution is particularly appropriate in cases where a specific numeral is not known because a variable tag appears in the translatable segment, and where there is no reference material that can provide information about a particular phrase. In addition, it allows the consistency of target segments to be ensured. For example, instead of the full translation of the word page (puslapis in Lithuanian), the shortened Lithuanian p. is used. For example, page 1, pages 3-5 are translated as 1 p., 3–5 p. instead of 1 puslapis, 3–5 puslapiai.

In the Lithuanian language, nominals have two genders, masculine and feminine. There are no neuter nouns in the Lithuanian language. In translations from English it is often necessary to carefully check the accuracy of full match segments, as the identical English language segments with neuter pronouns may require a different Lithuanian translation because of the need to use masculine or feminine pronouns. For example, let’s say the text refers to an object that is a masculine noun in Lithuanian. In English sentences, when this item is replaced by a pronoun, it will be it, its, they or them, and in the Lithuanian translation it will be jis (he), jo (his), jie (they), jiems (them). However, when the text refers to an object that is a feminine noun in Lithuanian, then the identical English sentence will be translated differently into Lithuanian because the feminine pronoun will need to be used: ji (she), jos (her), jos (they), joms (them) and so on. In addition, it must be remembered that it is not only the pronoun, but also the gender of adjectives relating to it that will be different. So in no way can you just press the “get translation” button without making sure that the hundred percent match is taken from the same context.

The rules for using quotation marks in the Lithuanian language are also different from most languages. For example, in English, company or brand names are written without quotation marks, while they are mandatory in the Lithuanian language. Therefore, clients are often dissatisfied with translations containing quotes, preferring to have a uniform layout of brochures or instructions for all localized languages. It is true that quotes can be avoided by distinguishing the names by other means such as a different font, but often clients are not satisfied with this and require there to be no distinguishing. However, the omission of any required quotes is exactly the same error as the omission of other punctuation marks.

In the Lithuanian language, the order of words in the sentence is absolutely free because the language is synthetic. However, in different situations and in different types of sentences, there is a certain established order of words. Inversion (words swapping places) is always possible, but in this case the sentence may seem unusual and, moreover, its meaning may be altered.

Let’s analyze an example. The English sentence “I cannot translate this text” can be translated in two ways into Lithuanian: 1. Negaliu išversti to teksto.  Here, it is the inability that is emphasized and, for instance, the fact that I have no time for the translation. 2. To teksto išversti negaliu. Here, the particular text is emphasized, and the text cannot be translated, for instance, due to its complexity or my lack of competence.

Word order in the Lithuanian language does not perform grammatical functions. However, word order performs a different function; it conveys the meaning. This means that a sentence with words arranged in any order will always be grammatically correct, but there will be a significant change in its meaning. For all of these circumstances, sometimes in the text, even in a technical text, observing the consistency required by clients and the CAT tools used is detrimental, if not to the meaning, then to the style at the very least. Usually, when making a translation of one sentence, it is always necessary to consider the emphasis of the previous sentence and to arrange the next sentence accordingly. However, when such a sentence is stripped of its context and automatically inserted into another context as a 100% match, the arrangement of its words without context can look very strange or even absurd.

The sequence of words is one reason, among others, why CAT tools cannot be used recklessly for Baltic languages.