Humans still beat machines when it comes to literary translation

November 8, 2022

You wouldn’t use Google Translate to churn out an English-language (or any language, for that matter) version of a novel like Gabriel García Márquez’s Cien años de soledad — or would you?

The answer to that question is likely still a resounding “no.” Although researchers have been fascinated with potential applications for machine translation (MT) in the field of literary translation, “any serious challenge to human literary translators [from machines] is still a long way off,” as the European Council of Literary Translators’ Associations put it in a 2020 report.

That said, researchers are still trying to see how MT can be applied to literary works — a recent study from researchers at the University of Massachusetts at Amherst attempted to reveal why MT usually falls flat compared to human literary translations.

“As literary MT is understudied (especially at a document level), it is unclear how state-of-the-art MT systems perform … and what systematic errors they make,” the researchers wrote in a paper, which was recently pre-published and available for free on ArXiv.

To shed light on some of the problems with literary MT, the researchers collected a corpus of non-English literary works that met the following criteria:

In the public domain in their source country by 2022
Multiple human translations published in English
Published in an electronic format

The dataset that the researchers compiled — named PAR3 — includes at least two human translations of every source paragraph. To test the efficacy of MT for literary uses, the researchers used Google Translate to create English versions of the source paragraphs and presented them, side-by-side, with the human translations, to two groups of readers: professional literary translators and monolingual English writers.

Perhaps unsurprisingly, both groups overwhelmingly preferred the human translations — 84% of the time, human raters preferred human translations to the machine-translated version. The raters also shared insights that the researchers believe could be used to improve MT’s potential for literary applications. Based on their feedback, the researchers identified five ways in which MT can be improved. Nearly half of the MT errors were a result of “over-literal” text translation — while these instances may not have been outright errors, they often disrupted the flow of the paragraph, making the text feel awkward to read.

Additionally, lack of context caused about 20% of the issues reported in the MT paragraphs. Other errors were either a result of poor word choice, over- or under-precision, and so-called “catastrophic” errors that “completely invalidate the translation” (misgendering a character, for instance). The raters also used these insights to create a GPT-3-based, automatic post-editing model to adjust the machine-translated output — the post-edited versions received more preferable ratings than the unedited versions produced by Google Translate.

“Overall, our work uncovers new challenges to progress in literary MT, and we hope that the public release of Par3 will encourage researchers to tackle them,” the researchers conclude.

Humans still beat machines when it comes to literary translation

RELATED ARTICLES

Google Translate is Finished. Again.

From Paris to Prompt: How La Francophonie Quietly Changed AI’s Language Priorities

How should large language models be regulated?

The Top 10 AI Developments of 2025

Google Translate Causes Vaccine Mishap

Weekly Newsletter, Subscribe to stay updated!

Login or Register