Why zero-shot translation may be the most important MT development in localization

In the months since Google’s announcement about its deployment of neural machine translation (NMT) caught the tech press’ eye, NMT has rapidly transformed from a pie-in-the-sky dream into a mainstream MT method. As MT developers, language services providers (LSPs), and social media enterprises have rushed to jump on the bandwagon, what the technology can realistically deliver often gets lost in the hype.

One basic point needs to be clear: NMT is a kind of statistical MT, not an artificial super intelligence. It differs from formerly state-of-the-art phrase-based (PbMT) statistical systems in key ways, but at its heart, it still relies on large amounts of human translation to work. It does not “understand” human language, nor does it learn languages in the same fashion as people do. The advantage of NMT over PbMT is that it can consider entire segments in one pass, and it can see layers of correlations. Where older systems would be able to see simple if… then correlations to determine translations, NMT can deduce much more complex correlations that involve multiple items and even correlations of correlations. The resulting translations should be more accurate and the system better able to learn from heterogeneous data.

Common Sense Advisory (CSA Research) finds that NMT does represent a major advance in MT technology, but the claims about the degree of improvement over PbMT from some early adopters relied on somewhat dubious methods that significantly overstated the difference. As a result, mainstream tech reporting quickly rehashed variants of the perennial claim that MT is five years away from replacing human translators. Others stated that NMT learns languages just like humans do, because it relies on neural networks that emulate the function of mammalian brains, or that Google’s NMT system had invented its own language.

In the midst of these hyperbolic claims, it was easy to lose sight of one of the most significant advances in the field. In November of 2016, Google announced that its system had acquired a new capability: zero-shot translation (ZST), the ability to translate in language pairs for which a system has not been trained.

With traditional PbMT technology, a system that translates bi-directionally between three languages would have six engines — one for each source and target pair. Each of them would have its own training data (although pairs like English > Spanish and Spanish > English might use the same underlying data) because an engine can handle only one language pair. However, Google’s NMT engine can handle multiple language pairs simultaneously. The ability to learn layers of correlations also allows it to learn correlations between language pairs. For example, it could observe that English frog typically translates as Spanish rana and as Hungarian béka and then use that association to predict that rana translates as béka, even in the absence of any Spanish<>Hungarian training data.

This capability becomes crucially important for language combinations with little or no training data. For example, very little Greek<>Finnish data exists. However, substantial amounts of training data can be found for each of those languages in combination with English. With PbMT systems, the English data would not directly help the Greek<>Finnish case.

By contrast, ZST allows NMT systems to fill in gaps in the training data, even if they do not share particular sentences in common — although performance improves as overlap increases and they work best with training data in the language pair. For example, a ZST system for news data could use translations of different articles for different languages and still learn from them. This ability is tremendously important for language pairs like Greek<>Finnish. In such cases, the ability to learn from all available data can make the difference between having an engine and having insufficient training material to do anything.

The alternative to ZST is classic pivot translation in which text is translated into one language and from that one into another (such as Finnish > English > Greek). Pivot translation can be useful, but tends to compound MT errors with each step. In addition, it only uses data from those language pairs involved in the translation task (such as Finnish <> English and Greek <> English in this case), even if other pairs might have relevant information. By contrast, ZST in an NMT system can simultaneously leverage all training data from all combinations. In the Finnish <> Greek case, this ability means it might benefit from relevant data in four or five other languages.

ZST stands to be one of the most important developments in MT. Consider the European Union’s 24 official languages. By law, citizens have the right to use their languages to communicate with government, but in practice this works only if they speak one of a handful of languages — a speaker of Maltese who needs to communicate with a Latvian-speaking official will invoke this right in vain. The EU faces a daunting if not impossible task in collecting the training data it would need for the 552 engines required to cover all official combinations. A ZST system would drastically cut this burden and enable the EU to provide more support in more languages, while providing better results than would be possible with a pivot-based process.

In a field where exaggerated and fantastic claims are common, ZST may be one development whose importance has largely escaped attention. By breaking down some of the walls that have confined under-resourced and less-common languages to second-class status in MT, it may have a profound impact on language access and the ability of citizens to interact with governments, corporations and other entities. Despite breathless reporting about NMT as realization of Douglas Adam’s Babel Fish, it will not replace human translators, but zero-shot translation will help fill the large gap where the alternative to MT is not human translation but zero translation instead