Linguistic quality assurance in localization

In pretty much any industry these days, the notion of quality crops up all the time. Sometimes it feels like it’s used merely as a buzzword, but more often than not quality is a real concern, both for the seller and the consumer.

Obviously, when it comes to translation and localization, quality has rather unique characteristics compared to other services. However, ultimately, it is the expected goal in any project.

Quality assessment and quality assurance

Anyone who has ever been asked to proofread knows that task definitions are important. Despite the fact that industry standards have been around for quite some time, in practice terms such as quality assessment, quality assurance and sometimes even quality evaluation are often used interchangeably. This may be due to a misunderstanding of what each process involves, but whatever the reason, this practice leads to confusion and could create misleading expectations. So, let us take this opportunity to clarify. On the one hand, translation quality assessment (TQA) is the process of evaluating the overall quality of a completed translation by using a model with predetermined values that can be assigned to a number of parameters used for scoring purposes. On the other hand, quality assurance (QA), as defined by Joanna Drugan in her book Quality in Professional Translation: Assessment and Improvement, “refers to systems put in place to pre-empt and avoid errors or quality problems at any stage of a translation job.”

Quality is an ambiguous concept in itself and making objective evaluations is a very difficult task. Even the most rigorous assessment model requires subjective input by the evaluator who is using it. However, if we can distinguish in a translation workflow between what a translator or reviewer can do while the project is still in progress, and what can be done after the project is completed, then we can get a better sense of what each process involves and how we can best allocate our human and technological resources in order to improve on quality overall. When it comes to linguistic quality in particular, we would be looking to improve on issues that have to do with punctuation, terminology and glossary compliance, locale-specific conversions and formatting, consistency, omissions, untranslatable items and others. It is a job that requires a lot of attention to detail and strict adherence to rules and guidelines, and that’s why linguistic quality assessment (LQA) — most aspects of it, anyway — is a better candidate for objective automation.

Industry practices, good and bad

Given the volume of translated words in most localization projects these days, it is practically prohibitive in terms of time and cost to have in place a comprehensive QA process, which would safeguard certain expectations of quality both during and after translation. Therefore it is very common that QA, much like TQA, is reserved for the post-translation stage. A human reviewer, with or without the help of technology, will be brought in when the translation is done and will be asked to revise the final product. The obvious drawback of this process is that significant time and effort could be saved if somehow revision could occur in parallel with the translation, perhaps by involving the translator with the process of tracking errors and making these corrections along the way.

The fact that QA only seems to take place after the fact is not the only problem, however. Volumes are another challenge — too many words to revise, too little time and too expensive to do it. To address this challenge, language service providers (LSPs) use sampling (the partial revision of an agreed small portion of the translation) and spot-checking (the partial revision of random excerpts of the translation). In both cases the proportion of the translation that is checked is about 10% of the total volume of translated text, and that is generally considered agreeable to be able to say whether the whole translation is good or not. This is an established and accepted industry practice that was created out of necessity. However, one doesn’t need to have a degree in statistics to appreciate that this small sample, whether defined or random, is hardly big enough to reflect the quality of the overall project.

The restrictions described here are the effect of the cost involved in using human revisers for all the tasks required in LQA for very large volumes of translated text. It would make sense, then, to enlist the help of technology in processing these large volumes of text and provide the support necessary for a more thorough QA process. The technology is there, but there are a few things to consider. In her 2007 paper Translation Quality Assurance Tools: Current State and Future Approaches, Julia Makushina explains that: “Whereas translation memory tools came into the market approximately in 1985, translation quality assurance tools are rather young. The oldest quality check utilities were probably incorporated back in 1998.… This means there is a 10-15 years gap in TM and QA tools development.” The progressive increase of the volumes of text translated every year (also reflected in the growth of the total value of the language service industry) and the increasing demand for faster turnaround times makes it even harder for QA-focused technology to catch up. The need for automation is greater than ever before. Let us then have a brief look at the history and the current state of affairs.

The evolution of

QA technology

When the first computer-assisted translation (CAT) tools were introduced in the mid-1980s, the only means for quality assurance was effectively a human proofreader. Spellcheckers then slowly became more diverse and more popular in word-processing applications. Later on, terminology management tools, which started emerging as companions to translation memories, provided a second layer of quality assurance checks, and in the late 1990s all these functions were incorporated in the first CAT tool to offer this kind of range. Other tools followed this example for more than a decade; CAT tools would develop QA functionality and include it in their suite of applications and plug-ins. The first tool designed and developed as a stand-alone QA-focused application was officially launched only in 2004.

This staggered evolution of quality assurance technology presents an interesting dynamic. Over time it is obvious that the need for QA checks became more and more pressing, as the automated processes supported by the continuously developing CAT tools provided the conditions for such functions to be developed. When the first standalone QA tools came about, CAT tools were already well ahead in terms of development. However, QA had never been a part of the core business for CAT software developers. In the early days, QA checks were a nice thing to have, but it took years before they were considered essential. Nowadays the situation is different: more and more CAT and QA tools have emerged, online CAT systems are becoming common and the demand for more efficient technology is growing fast.

Today we could classify QA technologies in three broad groups: built-in QA functionality in CAT tools (offline and online); standalone QA tools (offline); and custom QA tools developed by LSPs and translation buyers (mainly offline).

Built-in QA checks in CAT tools range from the completely basic to the quite sophisticated, depending on which CAT tool you’re looking at. Standalone QA tools are mainly designed with error detection in mind, but there are some that use translation quality metrics for assessment purposes — so they’re not quite QA tools as such. Custom tools are usually developed in order to address specific needs for a client or a vendor who happens to be using a proprietary translation management system or something similar. This obviously presupposes that the technical and human resources are available to develop such a tool, so this practice is rather rare and exclusive to large companies that can afford it.

Regardless of which of these three types of QA tool we examine, in an average localization workflow there are issues of integration that are worth looking at in more detail. For now, let’s focus on what this technology can do for us.

Methodology

Terminology and glossary compliance, empty target segments, untranslated target segments, segment length, segment-level inconsistency, different or missing punctuation, different or missing tags, different or missing numeric or alphanumeric structures — these are the most common checks that one can find in a QA tool. On the surface, at least, this looks like a very diverse range that should cover the needs of most users. All these are effectively consistency checks. If a certain element is present in the source segment, then it should also exist in the target segment. It is easy to see why this kind of “pattern matching” can be easily automated and translators/reviewers certainly appreciate a tool that can do this for them a lot more quickly and accurately than they can.

Despite the obvious benefits of these checks, the methodology on which they run has significant drawbacks. Consistency checks are effectively locale-independent, and that creates false positives (the tool detects an error when there is none) and false negatives (the tool doesn’t detect an error when there is one). Let’s look at an example.

Source [en-GB]: Could we meet on 3/4/2017 at 2:30pm?

Target [fr-FR]: Est-ce qu’on peut se rencontrer le 3 avril 2017 à 14h30?

With the exception of a couple of systems only after substantial customization by the user, QA tools that rely on consistency checks would produce no less than four instances of false positives/negatives in this plain-looking segment:

3/4/2017 to 3 avril 2017: number 4 is missing from the target, so that would be marked as an error. However, we know the date is correctly localized, so that’s a false positive.

2:30pm to 14h30: number 2 doesn’t exist in the target and number 14 doesn’t exist in the source, so both of these would be marked as errors. However, we know the time has been correctly localized, so these are both false positives.

2:30pm? to 14h30?: the required space that is missing before the question mark in the target would not be marked as an error, but we know that’s an error in the target locale, so that’s a false negative. Interestingly, this would not be a false negative if the target locale was fr-CA.

One can imagine how many issues such as the above can show up in a QA error report. Noise is one of the biggest shortcomings of QA tools currently available and that is because of the lack of locale specificity in the checks provided. It is in fact rather ironic that the benchmark for QA in localization doesn’t involve locale-specific checks. To be fair, in some cases users are allowed to configure the tool in greater depth and define such focused checks on their own (either through existing options in the tools or with regular expressions). But, this makes the process more intensive for the user and it comes as no surprise that the majority of users of QA tools never bother to do that. Instead they perform their QA duties relying on the suboptimal consistency checks that are available by default.

Linguistic quality assurance is (not) a holistic approach

It is now time to look at the issue of workflow. Localization and quality assurance are processes more than anything else and, when one tries to bring the two together, a number of challenges emerge. In theory, a QA process of some kind should be in place for all of the main three stages of a localization project:

1) Pre-translation: by ensuring that the source content that is to be translated is written clearly and well (and, in cases of large localization projects, that it has been properly internationalized), the risk of having to deal with all sorts of issues later on in the workflow of the project is minimized. This is a topic that has also been aptly discussed in the context of machine translation.

2) In-translation: in the process of translating the source content when various issues arise, such as terminology and other types of inconsistencies, and locale- or context-specific errors, translators should have the tools and be enabled to react quickly and make corrections whenever required. This way, they won’t have to deal with the same problems further down the line in the same project. A certain level of support for such issues is provided by some CAT tools, but it is rather patchy.

3) Post-translation: a thorough review of the translated content is normally reserved for after the job is completed, at which point the reviewer needs to find all the errors, fix them and make sure that no new errors are introduced in the process.

In practice, for the majority of large-scale localization projects only post-translation LQA takes place, mainly due to time pressure and associated costs — an issue we also explored earlier in connection with the practice of sampling. The larger implication of this reality is that: a) effectively we should be talking about quality control rather than quality assurance, as everything takes place after the fact; and b) quality assurance becomes a second-class citizen in the world of localization. This contradicts everything we see and hear about the importance of quality in the industry, where both buyers and providers of language services prioritize quality as a prime directive.

As already discussed, the technology does not always help. CAT tools with integrated QA functionality have a lot of issues with noise, and that is unlikely to change any time soon, because this kind of functionality is not a priority for a CAT tool. On the other hand, standalone QA tools with more extensive functionality work independently, which means that any potential collaboration between standalone QA tools and CAT tools can only be achieved in a cumbersome, intermittent workflow: complete the translation, export it from the CAT tool, import the bilingual file in the QA tool, run the QA checks, analyze the QA report, go back to the CAT tool, find the segments that have errors, make corrections, update the bilingual file and so on.

All this has to be done manually with a lot of configuration in order to account for locale conventions and user preferences. If the CAT tool happens to be an online platform, the inherent problems of this workflow become even more exacerbated when the CAT tool needs to somehow be used alongside a desktop offline QA tool. It is no surprise, then, that translators and reviewers often refuse to adopt this kind of workflow, given how much time needs to be spent analyzing error reports and making corrections in different environments. Some desktop QA tools have in the last few years developed plug-ins for various popular CAT tools. However, this connectivity doesn’t really address the issue of workflow: as a reviewer, you still have to switch between platforms all the time in order to confirm and fix errors, and you are still doing all of this after the translation is complete.

The challenges described above will have to be addressed soon. As the trends of online technologies in translation and localization become stronger, there is an implicit understanding that existing workflows will have to be uncomplicated in order to accommodate future needs in the industry. This can indeed be achieved with the adoption of bolder QA strategies and more extensive automation. The need in the industry for a more efficient and effective QA process is here now and it is pressing. Is there a new workflow model that can produce tangible benefits both in terms of time and resources? We believe there is, but it will take some faith and boldness to apply it.

Changing the game of LQA

There are a number of use cases by language vendors and translation buyers that support the idea that something needs to change. We probably all know translators, reviewers or managers who have expressed their true feelings about the QA process they currently have to follow in their work. In less than equal measure, we probably also know people in the industry who are more than happy to maintain the current status quo. Managing the process of QA can obviously be quite a different experience when compared to actually performing the QA with tools and workflows that fall short of the demands for quality in the industry today. In many respects, change management and making a case for a new process can be more challenging than the new process itself. It is easy to stay put and resist change, even when you know that what you’re doing now is inadequate.

There is a way around this stagnation: get ahead of the curve! In the last few years, the translation technology market has been marked by substantial shifts in the market shares occupied by offline and online CAT tools respectively, with online tools rapidly gaining more ground. This trend is unlikely to change. At the same time the age-old problems of connectivity and compatibility between different platforms will have to be addressed one way or another. For example, slowly transitioning to an online CAT tool and still using the same offline QA tool from your old workflow is as inefficient as it is irrational, especially in the long run.

A deeper integration between CAT and QA tools also has other benefits. The QA process can move up a step in the translation process. Why have QA only in post-translation when you can also have it in-translation? This shift is indeed possible by using software enabled by application programming interfaces (APIs), which are in fact already standard practice for the majority of online CAT tools. There was a time when each CAT tool had its own proprietary file formats (as they still do), and then the TMX and TBX standards were introduced and the industry changed forever, as it became possible for different CAT tools to “communicate” with each other. The same will happen again, only this time APIs will be the agent of change.

Looking further ahead, there are also some other exciting ideas that could bring about truly innovative changes to the quality assurance process. The first one is the idea of automated corrections. Much in the same way that a text can be pre-translated in a CAT tool when a translation memory or a machine translation system is available, in a QA tool that has been preconfigured with granular settings it would be possible to “precorrect” certain errors in the translation before a human reviewer even starts working on the text. With a deeper integration scenario in a CAT tool, an error could be corrected in a live QA environment the moment a translator makes that error.

This kind of advanced automation in LQA could be taken even a step further, if we consider the principles of machine learning. Access to big data in the form of bilingual corpora that has been checked and confirmed by human reviewers makes the potential of this approach even more likely. Imagine a QA tool that collects all the corrections a reviewer has made and all the false positives the reviewer has ignored and then it processes all that information and learns from it. Every new text processed and the machine learning algorithms make the tool more accurate in what it should and should not consider to be an error. The possibilities are endless.

Despite the various shortcomings of current practices in LQA, the potential is there to streamline and improve on processes and workflows alike, so much so that quality assurance will not be seen as a “burden” anymore, but rather as an inextricable component of localization, both in theory and in practice. It is up to us to embrace the change and move forward.