A comparative study of post-editing guidelines

By Ke Hu September 16, 2016

Post-editing has been increasingly researched and implemented by language service providers (LSPs) in recent years as a result of the productivity gains it brings to translators. However, there is no single widely accepted set of post-editing guidelines. Since needs vary, it seems that guidelines will never be general or standard, which is why there are different guidelines. One set of guidelines considered here is from TAUS in 2016, one is from the LSP Moravia in 2014 and three are from academic scholars: Sharon O’Brien in 2010; Bartolomé Mesa-Lao in 2013; and Marian Flanagan and Tina Paulsen Christensen in 2014.

Since most organizations prefer to keep their post-editing guidelines for internal use only, I only have access to the few that have been published. Among them, I selected the five proposals mentioned previously because they have been published recently, are relatively complete and are proposed in terms of two categories: light (rapid or fast) post-editing, and full (or heavy) post-editing. For the convenience of comparison, the five selected sets of guidelines are general rather than language-dependent or aiming at specific contents.

Different levels of post-editing

According to ISO 17100:2015, post-editing means to “edit and correct machine translation output.” Jeffrey Allen pointed out the distinction between different levels of post-editing in 2003. He first explained the determinant factors of the post-editing level and proposed using inbound and outbound translation to categorize the types and levels of post-editing. For the inbound one, there are two levels: machine translation (MT) with no post-editing (for browsing or gisting), and rapid post-editing. For the outbound one, which means the translation is for publication or wide dissemination, the three levels are no post-editing, minimal post-editing and full post-editing. The intermediate category of minimal post-editing was termed “fuzzy and wide-ranging.”

Rather than differentiating between guidelines for light and full post-editing, TAUS differentiated between two levels of expected quality: “good enough” quality and “human translation” quality. However, in this article, for comparison purposes, I will equate these as light and full post-editing guidelines, which are the two most popular post-editing levels.

It can be seen clearly that most people or organizations dealing with translation have very similar views about these two levels of post-editing. Light post-editing usually means the quality is good enough or understandable, while for full post-editing, “human-like” is usually the key phrase. According to TAUS, full post-editing should reach quality similar to “high-quality human translation and revision” or “publishable quality,” while light post-editing should reach a lower quality, often referred to as “good enough” or “fit for purpose.” As Donald A. DePalma, founder of Common Sense Advisory, put it in 2013:

“Light post-editing converts raw MT output into understandable and usable, but not linguistically or stylistically perfect, text… A reader can usually determine that the text was machine-translated and touched up by a human… Full post-editing, on the other hand, is meant to produce human-quality output. The goal is to produce stylistically appropriate, linguistically correct output that is indistinguishable from what a good human translator can produce.”

Iconic, an MT company based in Dublin, categorizes light and full post-editing by answering three questions: what, when and result. It suggests that light post-editing is for internal dissemination, while full post-editing is for wide dissemination or certified documentation.

Comparative studies of guidelines

TAUS established post-editing guidelines in partnership with CNGL (Centre for Next Generation Localisation) in 2010 with the hope that organizations could use the guidelines as a baseline and tailor them for their own purposes as required. This is the first attempt at publicly available industry-focused PE guidelines. The guidelines start with some recommendations on reducing the level of post-editing required. TAUS highlighted two main criteria that determined the effort involved in post-editing: the quality of the MT raw output and the expected end quality of the content. They then proposed the guidelines according to the different levels of expected quality. Flanagan and Christensen carried out a research project and tested the TAUS guidelines among translation trainees in 2014. Based on the result, they developed their own set of guidelines for use in class. They adopted the TAUS guidelines for light post-editing and proposed their tailored guidelines for full post-editing according to the TAUS baseline for translator training purposes. Recently in 2016, TAUS updated its guidelines to include a greater amount of detail than the previous set. For the purposes of this article, I will only discuss the section that elaborates on the guidelines of different levels.

At the 2010 AMTA conference, O’Brien presented a tutorial on post-editing. She first introduced the general post-editing guidelines of Wagner, then the guidelines on light and full post-editing respectively. Mesa-Lao restated O’Brien’s general guidelines in his study in 2013. He reported his suggestions on how to decide whether an MT output should be recycled in post-editing or not. He also mentioned the rules of Microsoft (the “5-10 second evaluation” rule and the “high 5 and low 5” rule) on making these decisions in his research.

Although LSPs possess their own tailored post-editing guidelines, very few have been released online. Lee Densmer, senior manager at Moravia, wrote about her post-editing guidelines in her blog on Moravia’s website. The guidelines are her personal opinion but may represent what Moravia does to some extent. Similarly to Allen, Densmer listed the determinant factors of post-editing levels. They both believed that the client and the expectation to the level of quality played important roles. Based on their date of publication, we could argue that determinant factors listed by Densmer are more related to modern technology. While the factors listed by Allen are more traditional, including the time of translation, the life expectancy and perishability of the information, Densmer pointed out that the key phrases for light post-editing were “factual correctness” and “good enough,” which are in line with TAUS. She argued that light post-editing was not an easy job for linguists, due to the fact that linguists had to try their best to turn a blind eye to those “minor” errors. With reference to full post-editing, she indicated that “the effort to achieve human level quality from MT output may exceed the effort to have it translated by a linguist in the first place” and Iconic supports this assertion. In the end, she exposed the “shades of grey” which referred to the fact that many clients want the quality of full post-editing with the price and speed of light post-editing.

Inspired by the categories used in the LISA QA Model (Localization Industry Standards Association Quality Assurance Model) and SAE (Society of Automotive Engineers) J2450 translation quality metric, I created Tables 1 and 2 to compare the five proposals of post-editing guidelines. According to the variables in the left column, I listed all the corresponding requirements of the five proposals. There are some differences in terminology used by authors, but these terms appear to refer to roughly the same concept, such as “accurate” and “correct.” If the guidelines did not mention the variable, the cell was left blank.

From Table 1, it can be seen that all proposals value the accuracy of the message and correctness of semantics by light post-editing, while grammar, syntax and style are not a big concern. O’Brien and Mesa-Lao believe that there is no need to spend too much time researching incorrect terminology, while Densmer contends that terminology should be consistent. TAUS, Flanagan and Christensen, and O’Brien hold that the spelling fixes should be applied with basic rules, and the text should adapt to the target culture. If the sentence is understandable or correct, most proposals express that it should not be restructured. O’Brien clearly points out that the quality expectation for light post-editing is low. Densmer emphasizes machine-induced errors and translation alternatives in her guidelines.

Regarding full post-editing, TAUS and Densmer expect that the quality should have no difference with human translation, and they emphasize the significance of fine style. However, O’Brien and Mesa-Lao do not agree with a need to pay much attention to the style. They expect the quality after full post-editing to be medium rather than equal to translation from scratch. Should the quality after full post-editing be the same as human translation or maintain the traces of machine translation? We can see from Table 2, especially in the “Others” row, that the resource center and LSP are more inclined to human translation quality than the scholars. Scholars likely prefer medium-quality post-editing rather than human translation quality because they do not want to undermine human translation, even though post-editing machine translation can actually reach or even excel human translation quality today. As we always insist, machine translation exists to help humans, not to replace them.

If full post-editing should reach human translation quality, it still remains a question whether full post-editing is more pragmatic than translating from scratch in terms of cost. It is even debatable if post-editing can actually bring productivity gains, which leads to skepticism toward the benefits of post-editing. Ana Guerberof reported productivity gains in her research in 2009. Marcello Federico and his colleagues also found productivity gains in 2012. However, in 2014, Federico Gaspari and his colleagues found that post-editing could lead to productivity losses over translation from scratch.

The requirements of the full post-editing guidelines surpass the considerations of the light post-editing guidelines in terms of accuracy, semantics and culture in particular. Different from light post-editing guidelines, most full post-editing guidelines require the correctness of terminology, grammar, punctuation, syntax and formatting.

Setting a standard

From this comparative study, we can see that the existing guidelines have many overlaps, especially for light post-editing. The main differences lie in the full post-editing guidelines and concern the requirement for style and the expected quality of the target text, which I believe depends on the use and type of the text.

As I mentioned before, there are no standard guidelines, so LSPs and their clients should discuss and create their own tailored post-editing guidelines together beforehand. Clients should share with LSPs exactly what light and full post-editing is to be included before contracting for a job. Quality levels, throughputs and expectations must be defined in advance.

Although customization has been a popular choice by companies in the current days, the attempt to set a standard will never end. Recently, ISO has made headway in setting standard post-editing guidelines and currently offers a preview of ISO/DIS 18587.2(en): Translation services — Post-editing of machine translation output – Requirements. This standard clearly states the post-editing process, competencies and qualifications of post-editors, and the requirements of full post-editing.

In addition to the general guidelines discussed above, there are other sources of post-editing guidelines that are either language dependent or aim specific. They range from guidelines for BOLT machine translation evaluation in 2014 to guidelines for lay post-editors in an online community in 2015.

Acknowledgements: This work is supported by the Science Foundation Ireland ADAPT project (Grant No.: P31021).