Post-editing MT: Is it worth a discount?

Machine translation (MT) is undeniably an amazing, albeit controversial, technology. At times, its poor performance and errors in dealing with ambiguous texts make us chuckle; other times, MT output is so unintelligible that it leaves us puzzled.

Nevertheless, internet users love MT and are using it to a great extent. Interestingly enough, however, while over 200 million web users use MT from Google Translate alone every month, only a few translators and language service providers admit to using it for their work. Professional translators often point to how it actually reduces translation quality and productivity. Such reasoning, however, is often based on anecdotal evidence and on using the wrong approach when it comes to integrating MT in the professional translation workflow.

We have been using MT extensively for many years, and are currently using it for light post-editing (around 10% of’s turnover) and for over 50% of our standard translation projects where it is used as an extra suggestion along with translation memories (TMs). While constantly exploring new ways to further integrate MT in all of our processes, we are developing software to make it the standard suggestion source in computer-aided translation (CAT) tools.

Software alone, however, is not enough to guarantee that MT will turn out to be successful in the translation workflow. There are other factors to bear in mind, among which is whether translators are willing to use MT and how such technology affects their productivity and income. We tend to assume that using MT results in savings for customers and lower rates for professional translators. However, this is not always true and it seems that we still need to understand whether it makes sense to apply discounts for MT post-editing both for customers and translators and, if so, to what extent.


MT and translation providers

The translation industry has not proven too keen on adopting MT on a wide scale and many professionals in the industry still dismiss it as a laughable, mostly useless technology and refuse to adopt it for their work — or do they?

In fact, translators are probably not as disapproving and opposed to MT as it would appear by reading the comments that are so often published in online public forums. Observing the success that the MyMemory SDL Trados plug-in had combining MT and collaborative TMs, it seems that translators are actually quite willing to use MT — at least, when such technology is well integrated and not imposed on them, and when they can reap the benefits of using it.

Professional translators, language service providers (LSPs) and clients alike clearly understand that MT can prove an effective means to improve productivity and therefore reduce turnaround times and translation costs. Sure, one could debate whether MT should indeed be used in any translation project or whether it should be restricted to specific projects. Yet the decision on whether to adopt MT usually boils down to one simple question: assuming that the desired quality level is guaranteed and that the processes allow for the use of such technology, will MT improve translators’ productivity?

MT quality is not always predictable. It depends on a number of factors related to linguistic and technological issues: some language pairs are inherently more difficult than others to translate via MT, while for some other languages or domains there are not enough training data to build an effective MT engine. This results in an inconsistent effort required by translators in order to produce a translation starting from MT output. As a consequence, there is no one-size-fits-all approach to setting translators’ and clients’ rates for projects where MT is used. carried out a first attempt to solve the problem through an analysis of the purchase order acceptance rate when offering an MT post-editing option. During this experiment, we sent out purchase orders offering two options: translators could either be paid their full per word rate for a 1,000 word translation starting from scratch or the same rate for a lower number of weighted words, such as 700 (1,000 raw words – 300 words discount due to MT matches = 700 words), for post-editing MT output. We started offering 500 weighted words and slightly increased the number of words paid for the post-editing job until over 75% of translators were opting for the post-editing job over the standard translation job.

The number that prompted translators to switch to post-editing varied depending on the language pairs: for English to French and English to Italian, it was around 730 words — which meant that MT matches allowed the translator to achieve a 27% discount. The opposite happened with English to German, where the number had to be increased up to 1,100 words. If we wanted our translators to accept post-editing jobs in this language pair, we actually would have had to pay them more than for a standard translation.

This approach to defining the appropriate rate for post-editing jobs is quite fair as it gives translators full control over the best way to increase their productivity. However, it was only effective as a means to empirically understand how much MT was helping translators, but proved unfeasible for broader implementation. A more practical and usable way to measure how much MT improves productivity was needed.


Defining productivity

Defining whether MT would be useful in a specific project and to what extent it would reduce the turnaround times and costs is indeed a delicate task. Some research has been carried out on the subject, and a number of metrics have been proposed with the goal of predicting the quality of machine translation for a given project and hence the usefulness of MT. However, the key relevant element for all stakeholders involved is productivity as related to the actual effort to produce the desired output, be it a ready to publish translation or a “good enough” post-edited text.

Productivity can be expressed in terms of two performance indicators:

Time to edit: the average number of words processed by the translator in a given timespan.

Post-editing effort: the average percentage of word changes applied by the translator on the matches provided.

The ?rst indicator directly expresses the time labor required by the translators, and hence improvements on this ?gure are directly related to cost savings. The second indicator measures the quality of the matches provided by the TM and MT. This corresponds to computing a distance score between matches provided by the system and the post-edited version submitted by the user. The indicator is indeed an estimate of the percentage of edit operations performed in the whole set of translated segments.

The ability to understand to what extent MT can increase productivity allows the identification of when such technology can be integrated into standard TM tools. Measuring productivity using the abovementioned performance indicators requires the collection of a large amount of data from real translation projects that have leveraged MT. In order to reliably measure productivity gains and collect post-editing data, specific technologies are required to record the translators’ editing patterns and interactions with the software during translation, and the time needed to perform a given post-editing job. Together with the research organization Fondazione Bruno Kessler, the University of Le Mans and the University of Edinburgh, is working on MateCat, a European Union funded project that has among its goals the development of an enhanced web-based CAT tool integrating new MT functionalities.

The MateCat tool is an enterprise level CAT tool that can be used in real translation projects to collect information on the editing patterns and time to edit of each segment post-edited or translated by professional translators. It is able to collect:

Matches provided by the TM server (if any) with their relative quality match.

Matches provided by the MT engine (if any) with their relative quality match.

Target segments edited by the translator.

Time taken to edit each segment (measured by adding the time used to perform multiple edits on the same segment).

Post-editing effort measured by the word edit distance between the first match provided and the final translation.

The information is displayed in real time on a web interface and is also available in a CSV file, which allows for in-depth analysis of the results of each field test. Such data can then be analyzed to draw up statistics on the performance, and hence predict the usefulness of machine translation in specific language pairs and domains.

We believe that MT will become the predominant technology for production and that it will be integrated with current TM technology, so as to be used in the broadest range of projects. To this day, however, there are still no industry standards or common practices on a fair payment scheme for post-editing jobs, as there are for translation jobs where TM is used. Current CAT tools do not integrate the time-to-edit or post-editing effort measurements to allow for a fair and effective MT quality evaluation. is approaching the problem by developing technologies to measure the average time-to-edit in post-editing projects so as to understand what is to be expected in terms of productivity improvements from adopting MT. This will eventually provide a solid basis of statistical data to draw up accurate payment and cost schemes. Our initial results show that post-editing data rich and morphologically simple languages, such as English, French, Italian and Spanish, require an effort comparable to fixing a 75-99% TM fuzzy match (and by consequence would be paid about 60% of the full rate for new translations). Morphologically rich languages such as German and Czech do not appear to allow room for any discounts.

As of today, however, the available quality metrics and tools do not help much in predicting whether, and to what extent, MT is useful for translation providers and buyers alike. An open discussion with translators and customers still seems to be the only viable solution for LSPs.