Why zero-shot translation may be the most important MT development in localization


Arle Lommel
MultiLingual July/August 2017

In the months since Google’s announcement about its deployment of neural machine translation (NMT) caught the tech press’ eye, NMT has rapidly transformed from a pie-in-the-sky dream into a mainstream MT method. As MT developers, language services providers (LSPs), and social media enterprises have rushed to jump on the bandwagon, what the technology can realistically deliver often gets lost in the hype.

One basic point needs to be clear: NMT is a kind of statistical MT, not an artificial super intelligence....

In the midst of these hyperbolic claims, it was easy to lose sight of one of the most significant advances in the field. In November of 2016, Google announced that its system had acquired a new capability: zero-shot translation (ZST), the ability to translate in language pairs for which a system has not been trained.

With traditional PbMT technology, a system that translates bi-directionally between three languages would have six engines — one for each source and target pair. Each of them would have its own training data (although pairs like English > Spanish and Spanish > English might use the same underlying data) because an engine can handle only one language pair. However, Google’s NMT engine can handle multiple language pairs simultaneously. The ability to learn layers of correlations also allows it to learn correlations between language pairs. For example, it could observe that English frog typically translates as Spanish rana and as Hungarian béka and then use that association to predict that rana translates as béka, even in the absence of any Spanish<>Hungarian training data.

This capability becomes crucially important for language combinations with little or no training data....