The state of post-editing

By Isabella Massardo October 30, 2015

Post-editing of machine translation (MT) is as old as MT itself and, just like MT, it has been generating great controversy since its inception. Many professional translators have been refusing to post-edit MT output because of the low quality and the inherent defectiveness of MT engines, while many language service providers (LSPs) cannot figure out the best practices for using and offering MT and post-editing services. A general misunderstanding of what post-editing really is and its association with revision can be considered the common denominator for the problems of both groups.

Academic seal of approval

MT and all its related skills are slowly but surely making an entrance in university curricula all over the world. One example is the translation curriculum at Dublin City University (DCU). On April 30 and May 1, 2015, the School for Applied Language and Intercultural Studies (SALIS) at DCU hosted the “MT Train the Trainer” workshop, sponsored by the European Association for Machine Translation (EAMT). The workshop’s objective was to provide trainers with the necessary understanding, confidence and skills to teach MT to professional and would-be translators.

At the beginning of his presentation, one of the speakers, Professor Andy Way, told the participants (among which were many lecturers from various European universities belonging to the EAMT network) that one of his main tasks — and one of the main goals of his course on statistical machine translation (SMT) — is debunking myths and fighting prejudices about MT. In fact, misunderstandings are still widespread and deeply rooted in the academic world as well as in the industry.

Key points of discussion during the workshop were the skills of a good post-editor, MT evaluation and post-editing skills to be included in the program. In this respect, the fundamental question was — and actually still is — whether post-editing should be taught only to highly skilled translators or to beginners as well.

These questions are still unanswered at large, as it emerged from the DCU curriculum that Professor Dorothy Kenny and Sharon O’Brien illustrated. The master’s program in translation technology at DCU develops over an academic year of eight months, in two semesters. MT and post-editing is one of five compulsory modules. Two of the other three remaining modules are centered on translation practice and profession; one is centered around terminology and the last one around translation theory. The curriculum also offers optional modules such as localization and corpora linguistics.

The DCU program is focused on a general background knowledge of various MT-related topics, from the difference between rule-based machine translation and SMT to quality evaluation and post-editing. In lab sessions, students are guided through the building of an SMT engine, starting with the selection of source texts and evaluation metrics, the optimization of source texts and translation processes, the improvement of the engine, and ending with the evaluation of the raw output. In brief, the DCU curriculum offers a good beginning and a promise of a brilliant academic future for MT and post-editing. However, one question comes to mind: is the profile of future post-editors really looking so complicated?

Dublin is once again at the forefront in implementing translation technology in academic curricula. But other universities are now following the DCU’s example and are working to include an optional course on machine translation and post-editing. Even the conservative Italian academic world is lining up to join the cause. Both the University of Bologna and the UNINT (Università degli studi Internazionali) in Rome, for instance, have begun a similar course on MT and post-editing for the 2015/2016 school year. At first glance, the programs seem to be of a very theoretical nature. Also, browsing through the curriculum and reference material one cannot fail to notice two issues: the most recent recommended articles date back to 2012 and there are no references to post-editing issues in various languages. In the course description of UNINT there is mention of lab assignments in only one language combination (English into Italian). In any case, it will be interesting to see how these curricula evolve.

During DCU’s “MT Train the Trainer” workshop, two main weak spots were eventually identified: the still evolving nature of post-editing and the lack of shared methodologies for an efficient post-editing process.

The ISO/DIS 18587 standard

Shared methodologies and standards are important because they define the building blocks of common protocols to be adopted by all. Standards are not compulsory — we don’t have to adhere to them, but they are meant to make life easier. Think of electric plugs and how the differences and similarities between them affect us.

After 70 years, post-editing is still very much an evolving skill and, therefore, a standard could be premature, unless it is meant to be more regulatory and directing than it is meant to be normalizing.

Unfortunately, the long-awaited ISO standard on post-editing (still in draft, now at close-of-voting stage) represents a step away from reality, first of all because there was no direct contribution from translation practitioners and, secondly, because it aims to establish principles and requirements for a discipline that is still very much fluid, and that seems to be going toward a completely different direction.

For example, the draft in question defines post-editing as revision of a machine-translated text (see 2.1.4 of the document), which clearly goes against the nature of post-editing itself as a process/skill set inherently different from translation and, therefore, revision.

The goals and scope of MT are different from those of human translation, and the same applies to post-editing and revision. It is therefore pointless to keep comparing these four different areas. The definition offered by TAUS in its 2010 report on post- editing (“the process of improving a machine-generated translation with a minimum of manual labor”) allows for a more precise distinction between two different activities.

The preproduction and production section 3 of ISO’s standard draft lists a long series of tasks without any clear methodological indication on how to proceed, and, in a way, it includes some of the tasks belonging to the more traditional translation process (such as terminology check and format control).

The confusion in the ISO standard continues in Section 4.1 with the list of translation skills presented as necessary competences. Translation competence is the first thing that springs out, together with the “linguistic and textual competence in the source language and the target language,” which leaves no room for monolingual post-editing done by subject matter experts.

In short, the standard draft on post-editing is too little, too soon. The risk of such an early normalization is that it could very quickly become outdated.

Post-editing in CAT

tool environments

In recent years, MT has become one of the functionalities available to computer-aided translation (CAT) tool users, first through general engines such as Google Translate and Microsoft Translator and then with APIs for other commercial engines. All main CAT tools nowadays have plug-ins to MT engines offered by the main technology providers of our industry. This might be one good explanation of the sudden popularity of post-editing, although some practitioners are not ready to admit to having been charmed by the productivity offered by an MT engine.

In a blog post published on August 18, 2015, Memsource makes a bold statement by declaring that post-editing might be going mainstream. Based on the data collected within the Memsource community, over 50% of all the translations from English to Spanish and from English to German are done by combining a translation memory with an MT engine. Whether this is enough to claim that post-editing is overtaking conventional translation remains to be seen. It would be interesting to compare Memsource data with those from other CAT tool providers.

Another good point made in the aforementioned blog post — and one that the ISO/DIS 18587 ignored — is the evolution of post-editing from conventional to interactive. Conventional post-editing was born with MT and happens when the raw output is sent to the post-editor “as is,” while interactive post-editing is enabled by integration with other CAT tool functionalities, and it allows users to leverage fuzzy matches and MT suggestions for a higher productivity.

SDL was the first of CAT tool providers to catch on to the changing nature of post-editing. The SDL online post-editing course is available as part of the company’s certification program and offered free to SDL Trados license holders. It focuses mainly on the three statistical engine types used at SDL (baseline, vertical and customization) and, more specifically, on the BeGlobal technology, with recommendations on how to best use SDL Trados for an efficient automatic and manual post-editing.