Standards, terminology and Europe

We use standards every day, in all aspects of our lives. Some standards have been around for hundreds or even thousands of years. Think, for example, of weights and measures and how their differences and similarities affect us all. Standards provide a shared reference framework that ensures safety, reliability, interoperability and transparency, with partners having common expectations on each other’s performances and products.

Translation and terminology standards

In 1991, during the third TermNet Summer School in Vienna, Christian Galinski predicted that, before the turn of the millennium, the importance of terminology would eventually be universally acknowledged. Galinski also predicted that terminology would reserve its own place among C-level executives.

A quarter of a century later, terminology is still an ancillary discipline for a belittled profession, with a lot of specialized literature considering terminology. However, no standard effort, however smart, can keep the pace with technology evolution.

Terminology plays a crucial role in accessing and managing information, especially today, but it is still a knowledge-intensive, labor-demanding human task, with users being more and more often unaware of — and possibly uninterested in — its principles and methods, and the many terminological standards available are becoming obsolete as soon as they are published because of the slowness of the process and the verticality of topics and efforts.

Every year, TermNet, the Vienna-headquartered International Network for Terminology, organizes an online training with a final exam that requires the presentation of an application scenario. The course is sponsored by the European Certification and Qualification Association, a nonprofit association whose aim is to provide a worldwide unified certification schema for numerous professions. Sessions are held by academics and experts tackling the main aspects of terminology management, with participants being given useful information and examples, but almost no practical exercises on term extraction, stop-word list building, term data handling or other real-life scenarios. Instead, much time is devoted to data categories, data modeling, semantic interoperability and even on team management theory.

How much time can translators — be they freelance or in-house linguists — really spend on terminology, if we consider the productivity level and the strict deadlines that are imposed by the various parties involved in a translation project?

From experience we know that translators hardly have the time to quickly click on the concordance option in a computer-aided translation (CAT) tool to browse through the translation memory they were given and add terms with a second click to a given term base. We also know that the exchange of term bases from one CAT tool to another will result in a loss of metadata and cause import problems.

In 2007 Erin McKean, a lexicographer and editor for the Oxford American English Dictionary, gave an enthusiastic TED Talk on the joys of lexicography. Her objective was clear even for a layman: the creation of an online dictionary collecting not only all the traditionally accepted words and definitions, but also new words and new uses for old words. The talk became a huge success.

McKean heads, the world’s biggest online English dictionary by number of words. Example sentences are pulled from major news media (such as the Wall Street Journal) and from books available in the public domain (Project Gutenberg and the Internet Archive), as well as from other sources across the web including less conventional ones such as blogs. The website also offers all sorts of information on each word: synonyms, hypernyms, hyponyms, words used in the same context, a reverse dictionary and tags.

Of course, there are differences between lexicography and terminology. One might suffice for all; while the former is descriptive, the latter tends to be more normalizing, if not prescriptive. But is pointing us in the right direction. Collaborative, cloud-based translation environments that allow the sharing of linguistic data — in the form of translation memories and term bases — coming from all the parties involved in a translation project are the best way forward.

A role for Europe

The Old Continent is where standardization was born and is still the translation studies homeland for many research, staffing and resource organizations. And yet, most efforts have been focusing on updating terminology and translation standards and issuing new ones, without giving evidence of their actual impact, if any, on the evolution of society.

Like translation, terminology is a complex, time-demanding, knowledge-intensive task, and it can be hard to show its cost effectiveness and attractiveness. Possibly, potential users could benefit from the definition and actual spreading of basic criteria and requirements for using terminology and profit from it.

While we are writing, a controversy is raging over the insolvency of four Italian regional banks. Unscrupulous staff from these banks allegedly pushed many unknowing customers to buy subordinated bonds. Customers had to sign long contracts written in the typical abstruse language of finance without being provided with any explanations about the nature of the bonds they were buying, and they eventually lost their life’s savings.

It would be difficult for anyone to find a comprehensive and yet concise description of what subordinated bonds are, for example. Wikipedia only offers an entry, in English, for “subordinated debt,” with the equivalent, in Italian, of debito non garantito (junior debt) containing a reference to an obscure credito chirografario (unsecured debt).

Forget InterActive Terminology for Europe (IATE), the interinstitutional terminology database of the European Union  (EU) administered by the Terminology Coordination Unit (TermCoord) of the Directorate-General for Translation (DG TRAD) of the European Parliament. It has three entries for obbligazione subordinata, all marked as reliable, but whose definitions are mostly overlapping and inconsistent with standard methodology.

This should be solid evidence of the importance of terminology and of terminological resources, not only for translation but for everyday life. In fact, how many nonlinguists — and maybe even linguists — know of the existence of IATE?

And yet, this is not an isolated case. Fifteen years ago, at Linate Airport in Milan, Italy, an SAS airliner carrying 110 people collided on take-off with a business jet carrying four people aboard. All 114 people on both aircrafts were killed, as well as four ground personnel. Investigations identified a number of deficiencies in airport procedures, including violations of International Civil Aviation Organization regulations on the part of air traffic controllers, ranging from incorrect readbacks to the usage of non-standard phraseology in communications, with a specific irrelevant term — extension — leading to a fatal misunderstanding.

All this calls into question the weight and trustworthiness of terminology standards. We also need to mention that neither the International Organization for Standardization (ISO) nor the other standard-setting bodies provide for any public termbase whatsoever. As far as private termbases, a Common Sense Advisory survey revealed that only 41% of localization-mature organizations have some terminology management policy in place, almost solely translation-oriented.

Ten years ago, in an article in volume 13 issue 3 of KMWorld titled “The high cost of not finding information,” Susan Feldman reported that in 2001, the International Data Corporation (IDC) began to gather data on the costs an organization has to face when it doesn’t find the information it needs. IDC’s study showed that knowledge workers spent 15% to 35% of their time searching for information, that searches were successfully completed 50% of the time or less, and that only 21% of workers found the information they needed 85% to 100% of the time. The time spent looking for information and not finding it cost an organization a total of $6 million a year, not including opportunity costs or the costs of reworking the existing information that could not be located. The cost of reworking the information that was not found cost that organization a further $12 million a year (15% of time spent in duplicating existing information). The opportunity cost of not locating and retrieving information amounted to more than $15 million per year.

Also, in a study for the EU-funded MULTIDOC project in 2010, Jörg Schütz and Rita Nübel claimed that terminology has a cost multiplier of ten for localization and of 20 for maintenance.

Terminology management can be extremely costly in the short term, especially for a localization-negligent organization. According to a JD Edwards study presented at the TAMA conference in Antwerp in February 2001, one terminological entry has a cost of $150.00.

Many potential terminology users are possibly not very interested in standards, but have an interest in the associate terminology. Of the hundreds of standards available at ISO and regional standards bodies, more than half contain terminology. This could then be harmonized, structured and made publicly and freely available.

In November, the European Association for Terminology will celebrate its 20th anniversary in the historical first hemicycle of the European Parliament with a flashback on the activity in terminology during the past 20 years. At the event, a prize will be awarded for the best thesis on terminology. Rather than financing mammoth Directorate General for Translation (DGT)-oriented educational programs with the typical EU regulatory aim (have you ever heard of the bendy banana law?), the DGT could fund a program for the consolidation of the many dust-collecting terminological archives scattered all along the Old Continent in its innumerable universities. This program could be entrusted to a pool of outstanding graduates from the universities feeding the ranks of underpaid DGT interns.

Futurists, visionaries and wishful thinkers

In the last two decades, the ability of effectively using and integrating a wide range of software tools forming the typical translator’s toolbox has become pivotal. Today, translating is less a question of language knowledge and more one of knowing how to use it and the right tools to exploit it. The integration of machine translation (MT) into the now widespread, comprehensive and increasingly mundane translation tools is making MT and post-editing part of a translator’s daily job.

The last year marked the final statement for data as the lifeline of our online existence. With hardware increasingly being commoditized and software simply a click away, data is gold. Machine learning technologies are revolutionizing everything, from image recognition to voice transcription to MT. These technologies require massive amounts of training data.

Translators will have to be able to build parallel corpora, produce, access and use (big) data, process unstructured dataset to mine, as well as produce and manage rich terminology data.

Terabytes of translation data are produced in Europe alone every year. But, as Andrew Joscelyne and Anna Samiotou recently explained in the “TAUS Translation Data Landscape Report,” data sources are heterogeneous and unbalanced, and private owners can be reluctant to give their translation data for free or even to open source it. Traditional public sources of translation data are no longer enough. Incentives are necessary for a translation open data project in order to prevent any conflicts of interests.