The Maze Runner:
How to navigate out of the quality maze and bring objectivity into translation evaluation

By Angelika Vaasa and
Dr Christopher Kurz


Quality. Not many words in the world of translation trigger such a wide range of emotions and opinions. Some say translation quality is not measurable. Others say it is. Some consider it to be at the core of translation philosophy and client satisfaction. Others view it as negligible and better left to the users to tell. Some say it is the eternal dichotomy between source and target text. In other words, quality can mean everything and nothing. It cannot be grasped. That’s it!

But is it so? Does quality, especially translation quality, feel like a maze of options and possibilities where we are lost because quality can be so many things? Once you think you have figured it out and can manage it, you still end up going around in circles without being closer to exiting the quality maze? How can we make sense of this complex notion and come up with a workable plan to navigate safely out of the maze of quality notions and perceptions? 

We, the authors, have worked most of our lives in the translation industry and firmly believe that there is a workable way out of the translation quality maze. We believe that once you start demystifying quality and view it as something comprehensible, manageable, and measurable, it becomes an objective element in the translation workflow. This means that it can be managed the same way as deadlines, vendors, or budgets.

This article consists of two parts. Each part looks at translation quality from a different perspective. Part 1: The requirements perspective (written by Angelika Vaasa) discusses the importance of translation project specifications that reflect the needs of the clients and end-users and how the specifications become the foundation for translation quality evaluation.  Part 2: The evaluation perspective (written by Dr. Christopher Kurz) describes the way translation quality evaluation can be set up and carried out in an objective way to create reliable and trustworthy quality key performance indicators (KPIs) that will benefit all stakeholders. Both authors have reviewed the other’s part and agree on the expressed opinions.

Part 1: The requirements perspective

From needs to requirements

Every translation project should start with an analysis of needs and expectations, because meeting them will ensure a successful project outcome. However, relevant needs and expectations are not always sufficiently identified for translation projects. The reasons are varied. For example, needs and expectations can be considered self-evident because of an assumption that a good translation is a universal concept. Therefore, there is no need to spend time articulating specific needs. Another reason can be that gathering detailed information is seen as time-consuming and cumbersome because often translation projects have a multitude of stakeholders, and their expectations can vary considerably. However, identifying needs and expectations is the first step towards managing quality, and handling this part superficially can have repercussions on the translation quality.

When considering needs and expectations, it is important to start the analysis by identifying the stakeholders to make sure that all stakeholders’ needs are addressed. The stakeholder who requests a translation service is a requester. From the perspective of the entity that provides the translation service, this can be an external client or a department in the same organization (e.g., a company or an institution). However, the requester is often not the end-user of the translation product. For example, the requester might need the translation product to reach or retain their customers or because they have a legal obligation to provide the information in another language. This means that the end-user is another important stakeholder to consider. Identifying whether the requester is also the end-user or not helps to make sure that all needs and expectations are considered.

When the requesters themselves cannot form an opinion on the translation product because they are not the end-users and do not understand the language, they might resort to Google Translate for back-translation (the least reliable way to check the quality!) or they turn to other people for an opinion. The range can include anybody from the “native-speaker” colleague to an end-user of the translation product speaking the target language. Sometimes the requester is disappointed because the people consulted are not satisfied with the translation. One way to make sense of this is to say that translation quality is subjective and “in the eyes of the beholder.” Asking end users for their opinion to determine the quality of the translation product is called the “user-based” approach to quality. When the end-user needs are properly documented and become the yardstick for the translation, the “user-based” approach can be used as an important tool for translation quality management.

The user-based approach looks at quality as synonymous with satisfaction and making this your only yardstick boosts the opinion that the translation quality evaluation by language professionals is superfluous and unmanageable. However, a translation service provider should not rely solely on end-user opinions and disregard other approaches to quality (see the figure for five approaches to translation quality). For example, the product-based approach allows the quality to be objectively evaluated and, if implemented properly, covers also users’ needs and expectations. Incorporating all approaches will provide a solid base for efficient translation quality management.

When using both product-based and user-based approaches, the translation might be considered good during the quality evaluation by a language professional but fail to achieve high end-user satisfaction. What to do if the product-based evaluation and user-based evaluation deliver different results? Does it mean that the evaluation by language professionals is not reliable and thus not worth spending time on? Not really, and a closer look at the evaluation strategies and systems of the translation service provider can provide some answers.

Experience shows that often this discrepancy occurs because insufficient attention was paid to defining the needs and expectations of the end-users at the start of the translation project. The requesters should know their end-users and be able to define the common needs of end-users if they want the translation product to meet these needs and expectations. Asking the translation service provider to “just translate” is not enough. Unfortunately, this is precisely what the translation service providers are often asked to do.

Imagine the following situation: You go to a department store and say to the shop assistant that you need a present for your niece. Will the shop assistant just propose you something to give as a present? Probably not. They will ask you questions about your niece: her age, hobbies, and interests. Because this information increases the chances of finding a present that your niece would be happy about. And if she is happy, you will be happy as a client of the service. Similarly, the requesters and the translation service providers should have a conversation about the end-user needs, expectations, and preferences. It is the responsibility of the translation service provider to ask questions to elicit the necessary information. It is a sign of a well-run localization department if the needs of an internal requester can easily and precisely be described at the level of necessary detail. In the case of company-internal localization projects, it can also be the responsibility of the localization department to go and find out who their end-users are and what they expect.

To structure the conversation about the needs and expectations, it is good to have a list of topics at hand that are specific to translation projects and guarantee that you have covered all important aspects. For example, the purpose of translation and its target audience are important parameters to define. ASTM F2575 Standard Guide for Quality Assurance in Translation and the upcoming revised version of ISO 11669 Translation projects – General guidance provide a list of translation parameters to have up your sleeve when determining the requirements for translation projects.

In addition to end-user requirements, there are sometimes other implied or obligatory requirements that the translation product needs to fulfill. These can be even more important than the end-user needs. There can be legal requirements that need to come first. For example, in the food and beverage industry, there is European legislation that must be followed when translating food labels. A recent judgement of the Court of Justice (case C-881/19) concerned the translation of the ingredient “chocolate powder” into Czech. This ingredient should have been translated precisely following the wording of the relevant law as “čokoláda v prašku.” It was translated as “čokoládový prašek” and the argument by the company that these two terms were “substantively identical” was not accepted by the court, which ruled that the official language version of the law should have been followed. Thus, there are situations where end-user satisfaction will not be relevant because there are legal obligations that need to be met.

From requirements to specifications

Once the requirements have been identified, they need to be turned into translation project specifications. Translation parameters will help turn requirements into specifications. Here is an example of how questions related to a parameter help to formulate a specification.

The set of all identified specifications will be a reference document used throughout all stages of the translation process and it is referred to as the translation project specifications or the translation brief. It will be a guide for the language professionals working on the project. For example, information about the translation’s target audience enables the translators to put themselves in the shoes of the reader and tailor the language for end-users. For the example of a tourist brochure, it is important to know what type of target audience we are hoping to convince to visit Estonia. Depending on whether we are translating to convince the Austrians or the Swiss or the Germans and whether it is a young audience or a more mature one, the language can be tailored to meet the expectations of the end-users. If this information is missing and the translation is produced with no specific audience in mind, it has a lower chance of generating user satisfaction.

Specifications bring us back to the product-based approach to quality because they are at the core of it. Quality in the context of quality management (ISO 9000 series of quality management standards) is defined as the “degree to which a set of inherent characteristics of an object fulfills requirements.” For translation projects, it means that translation quality is about the translation product fulfilling the requirements that have been stated in the translation project specifications. Therefore, quality can be objectively evaluated if the requirements have been documented as specifications. Specifications are the key to the evaluation of quality because the quality is measured against them. For example, if the specifications required the use of a specific style guide and the style guide has been followed, the translation fulfils this requirement. 

What if there are no specifications? The translation service provider can still produce a translation applying a generic approach if they have minimum information available (you do need to know the target language at least), but it will be the same as the shop assistant giving you something they think is a good present for your niece without knowing anything about her. You might get lucky, and the niece is fine with it, but it could also go the other way. In the case of insufficient specifications, we cannot evaluate quality objectively because we have no requirements to check against. In this case, end-user satisfaction will be a gamble because it depends on the user’s perceived fulfillment  of personal and individual expectations. If end-user needs and expectations are not known, it is hard to meet them. Therefore, those who are serious about achieving user satisfaction and managing translation quality should always create translation project specifications and include them as part of the agreement or contract between the requester and the translation service provider.

Relationship between quality and satisfaction

Keeping quality and satisfaction as two distinct coexisting concepts with two separate definitions helps to avoid muddying the waters and getting hopelessly lost in the quality maze. We can look at satisfaction as a perception of to what extent the personal and individual expectations have been met. Since it is a perception, it is subjective. And we can consider quality as measurable in relation to fulfilled requirements. Since it is measurable, it can be objective. An extensive and objective translation quality management approach should include both of these concepts.

Even though we look at quality and satisfaction as two separate concepts, they are like two sides of the same coin. If the quality management system is functioning well, the translation quality evaluated by a language professional will correspond to end-user satisfaction. To achieve this, it is important that the translation project specifications pay sufficient attention to defining end-user needs and expectations. The quality evaluation system must also be built on these specifications, as we will explore later.

Part 2: The evaluation perspective

From subjectivity to objectivity

When it comes to evaluating translations the opinions and strategies are vast. Some people judge translations according to their gut feeling, others according to the notion of what they feel constitutes “good language” and some according to what they think is “the typical jargon of a specialization field.” The thing these translation evaluation perspectives have in common is very often an idiosyncratic, personal perspective and judgement about what is right or wrong and therefore usually a high degree of subjectivity. These kind of quality judgements (I struggle to call them “evaluations”) lead often to discussions between translation service providers and requesters as well as to arguments about whether the criticism of a translation was justified or not. It is exactly this kind of subjective judgement that has been present for decades and centuries and that does harm to the whole modern translation profession as well as to the translation industry worldwide. And it is exactly these often unfounded, biased, and undefendable claims that boost the cleverly crafted narrative that the human translators will be completely replaced by machine translation systems sooner rather than later. 

The opposite of these, often merely gut-feeling-based judgements, are analytic translation evaluations. These evaluations are based on the approach of putting translation evaluations into a methodical context and setting up an intelligent, process-based general evaluation framework. Such evaluations are usually embedded in an entire translation quality management process. An important part of this evaluation approach is that the outcome is based on facts and data. 

All analytic evaluations follow a defined process that focuses on a comparison of the target text with the source text. This detects non-conformities in accordance with the translation project specifications. These non-conformities are regarded and annotated as errors, and therefore choosing a set of suitable error types such as mistranslation, terminology, or spelling is an important part of analytic translation evaluation (for an extensive list of error types see the MQM homepage 

In addition to error types, severity levels should be chosen. The severity levels (e.g. minor, major, critical) mirror the impact of an error on the translation, so an error with only a small impact would be considered minor, whereas an error with a bigger impact on the translation would be major or even critical. For example, you could use the error type spelling in your analytic translation evaluation and assign 2, 4, and 8 penalty points to the severity levels minor, major, and critical. In contrast, the error type mistranslation might have 2, 6, and 14 penalty points assigned to these severity levels. Obviously, the chosen error types and the penalty points assigned to the respective severity levels depend on the purpose and the final use of the translation. Spelling mistakes in a marketing translation could be assigned more penalty points than spelling mistakes in a car repair manual that is not meant for the end-user of the car. 

One severity level that should be considered is the critical severity level. This severity level is meant for the errors that are so grave that they render the whole translation instantly unusable. Examples could be errors that pose danger to the life and limb of the end-user or errors that can lead to severe reputational or financial loss. Also, errors that can cause, for example, mechanical breakdowns of machines are critical. If these errors are encountered during a translation evaluation the result must be an instant fail of the translation.

A clear and objective quality indicator and an important KPI is the error score. It is calculated by dividing the sum of all penalty points by a value of the size of the evaluated text, e.g. the total number of words of an evaluation (for example 17/2650 = 0.0064). The result of this division is an error score that can be used to determine the quality rating of the evaluation by matching the error score to a pre-defined quality rating system. For example, the error score of 0.0064 would mean that the translation is of sufficient quality in the following quality rating system: 

  • Error score < 0.0030 = good translation quality
  • Error score 0.0031 – 0.0090 = sufficient translation quality
  • Error score > 0.0091 = bad translation quality. 

Important and widely adopted industry standards such as the LISA QA model and SAE J2450 follow these principles. The upcoming ISO standard ISO 5060 – Evaluation of Translation Output follows also this analytic, number-based approach.

The final step on the objectivity ladder: Making evaluations independent of the person

If you define and communicate the translation project specifications and use an analytic evaluation approach you will achieve inter-rater reliability (IRR). This expression refers to the fact that if you have your evaluation system aligned with the described process and have your evaluators trained properly for working with your evaluation system (one very important point that is often forgotten) the outcome of a translation evaluation will not depend on the particular evaluator anymore. A properly set up translation evaluation system and proper training will ensure that a translation evaluated by different evaluators will produce almost the same result, independent of the evaluator. 

Taking a closer look at the training of evaluators, one basic element is the process of progressing with experience and being assigned to tasks which require more responsibility (known as “climbing the ladder”). This means that evaluators should have several years of experience working as translators before taking up the role of evaluators. The reason behind this is the experience you gain over the years and the ability to understand and detect translation errors faster and more reliably. It is debatable whether it makes sense to let junior translators evaluate senior translators’ translations as they might lack the experience, translation skills, technical insight, and knowledge of the evaluation strategies to detect problematic or mistranslated text passages. 

Along with the experience of evaluators, the aspect of training the evaluators is very important when setting up a professional evaluation system. Familiarizing oneself with the company’s overall evaluation strategy, error type definitions, boundaries of each severity level, the purpose of the translation and the evaluation are the first steps. These topics can be elaborated upon in workshops and they lay the foundation and the basic understanding of the whole concept of “professional and analytic translation evaluation.” The next step is regular communication among the group of evaluators and systematic reviews of the evaluations. Decisions on evaluations should be discussed, questioned, and defended within the group of evaluators and re-checked whether they match against the evaluation strategy — and of course the translation project specifications. All this can lead to inter-rater reliability — or at least to a very high degree of it. This approach has been adopted and proven in practice in the European translation industry by many translation service providers and clients, and this is basically the point where real evaluation objectivity and real translation quality management begins.

Making the analytic approach work

As outlined in Part 1, professional translations should be aligned with the client’s requirements and be produced according to the agreed-upon and communicated translation project specifications. The globally known and extensively adopted ISO standard ISO 17100 Translation Services — Requirements for translation services is elaborating this principle in detail. 

So, when it comes to evaluating translations, it is important to set up an analytic translation evaluation system that refers strictly and only to the translation project specifications. Only this referral and a direct link will enable the evaluator to decide in a few seconds whether the carefully examined translation passage contains a translation error or not. To enable the evaluator to do this task, the nature and the definition of error types used in the evaluation system should be of binary nature, meaning that the decision “error or not” should be clear cut and not something the evaluator and the translator can have lengthy discussions about. If you choose error types where the evaluator will not be able to easily determine in a binary manner whether a translation error is present or not, you should leave these error types out of the translation evaluation system. They might blur the powerful effect of the evaluation system. If in doubt — leave it out!

Bringing in professionalism and restraint

And this leads us to a classic industry myth: You cannot judge style because we are all so different and work so differently when translating texts. This is partly true. Yes, we are all different individuals with our own personal style. But this is what  style guides are for — good style guides, at least — and adherence to these style guides must be a part of the translation project specifications. Defining in the style guide that a certain target language expression (meaning not only terminology) needs to be used for a source language expression (and this is what corporate language is all about) enables you to decide quickly whether the translation is compliant with the style guide. It might look like a big mountain to climb, but once you start setting up a style guide you will notice that this task is actually manageable. On the other hand, if a certain expression is not listed in the provided style guide and the translator’s method is slightly different to your preference — leave the translation as it is and take a professional evaluator’s distance! You are evaluating a translation — not adapting the translation to your personal liking. Preferential changes, which are most often stylistic, are poison to an objective analytic evaluation approach. 

This approach will bring into a translation quality evaluation a considerably high degree of objectivity that is often claimed to be impossible to achieve. And it will remove the perception of translation as a subjective art and replace it with a (desperately needed) industrial production perception. And yes: All this is possible if you plan your evaluation strategy and your translation evaluation system meticulously. The described strategy helps enormously in fostering the acceptance of evaluation schemes and scorecards and can be a real game-changer in your translation quality management and translation evaluation process. 

Remaining doubts?

If we circle back to the user-based approach, we can say that asking different end-users for their quality rating is like trying to score a goal in football, but someone is constantly moving the goalposts. End-users often cannot give a reason why they like or dislike something and their ratings are very often subjective and erratic (just have a look at product ratings on Amazon…). That is why we cannot take user ratings and satisfaction scores as the only quality yardsticks. They do provide an important indication, but about satisfaction, not quality.

Therefore, we would like to stress the importance and the possibility of establishing translation quality KPIs based on the analytic translation evaluation. “What cannot be measured doesn’t exist” is a statement you may hear in the translation industry. However, major companies in all industries around the world base their strategic business development decisions on KPIs — at least the successful ones do so (even the “fancy,” “disruptive,” and “rock-star-like” ones in the Silicon Valley). So why should the translation industry be different, and why should your quality KPIs be based on gut feelings and individual, idiosyncratic preferences? The analytic evaluation approach enables you to base your business development decisions (extension of a supplier contract, adding new languages to the contract, canceling a supplier contract) on solid ground and trustworthy, reliably established data.

The safe way out of the maze

Demystifying translation quality starts by acknowledging that quality can be managed, but it requires a multi-pronged approach. Here are the eight tips that can serve as your trail of bread crumbs to navigate out of the quality maze and bring objectivity into translation quality evaluation.

  • Distinguish between “quality” and “satisfaction.”
  • Base your quality management system on the product-based quality approach. The “degree to which a set of inherent characteristics of an object fulfils requirements” is your golden rule.
  • Define the needs and expectations of different stakeholders by applying the user-based quality approach. Consider the characteristics of the end-users and the differences among different groups of end-users.
  • Turn the needs and expectations with the help of translation parameters into translation project specifications, and make sure they are communicated to all stakeholders.
  • Use translation project specifications as a basis for your analytic translation quality evaluation.
  • Set up your analytic translation quality evaluation by choosing a suitable set of error types and severity levels. Be careful to define your error types as binary.
  • Make sure that your evaluators have proper experience and training and that they regularly discuss and review evaluations.
  • Use the results of your analytic quality evaluation as a basis for your key performance indicators. You should be able to measure and express in KPIs every aspect of your translation quality management system.

Planning and setting up your translation evaluation process as meticulously as your translation process and reviewing it regularly is crucial for a successful translation process that delivers a translation product with the required quality level. Establishing clear translation project specifications and evaluating the quality of translation products using an analytic evaluation approach will make a difference and create solid translation quality data. All this will help you to get out of the quality maze and give you tools to measure and discuss translation quality objectively.

Angelika Vaasa holds master’s degrees in languages, interpreting and European legal studies. She works at the translation services of the European Parliament where she manages the translation quality. She is also a co-project leader for the upcoming revised version of the ISO 11669 Translation projects – General guidance.

Dr Christopher Kurz holds a PhD in translation quality management from the University of Leipzig where he studied translation. He is the Head of Translation Management at ENERCON, a leading global supplier of wind turbines. He is also the project leader of the upcoming ISO standard 5060 – Evaluation of Translation Output. Together with Jean-Marc Dalla-Zuanna, he is the co-editor of Translation Quality in the Age of Digital Transformation.



Live Machine Translation

By Oddmund Braaten

Advancements in communication technology are helping bring the world closer together without ever having to leave the house. As a remote-first business, many of our…

→ Continue Reading


Subscribe to our weekly newsletter. Subscribe to stay updated

MultiLingual Media LLC