Machine translation (MT) is on the upswing of one of its periodic hype cycles. Based on surveys of companies using or evaluating it, Common Sense Advisory (CSA Research) predicts that the majority of multilingual enterprises will utilize the technology to meet a growing part of their global content needs. In terms of volume, it will be the dominant means of translation. Two hybrid variants — MT output edited by humans (aka post-editing) and augmented translation that serves up machine suggestions alongside other resources to translators à la translation memory — will increasingly blur the boundary between it and the activities of human linguists. In the last year, development accelerated as neural MT (NMT) left the lab and became commercially available. This new variant arrived just as artificial intelligence (AI) and machine learning exploded in terms of mainstream business interest, thus amplifying both hype and investment.
While enterprise planners see MT as a major component in their multilingual content strategy, some language service providers (LSPs) wonder whether it will put them out of business. Others see it as another business opportunity. In 2016, a CSA Research survey found that most large LSPs already deliver some form of the technology for their customers. It also determined that those that embrace this technology grow at a much faster rate than those that don’t. However, the research showed that smaller LSPs struggle with the difficulty of finding personnel with the skills to move beyond basic services. In addition, they must deal with pricing models that are in rapid flux as they, and system developers, try to find ways to monetize MT. And the choice goes beyond which suppliers to buy the software from. As more suppliers offer commercial NMT solutions, buyers must choose from a menu of rule-based, statistical, neural and various hybrid machine translation technologies, all with their own strengths, weaknesses and ideal use cases.
The blistering pace of development and change can leave both buyers of translation services and LSPs confused about when to deploy MT and what type to use. Wait too long, and they may find themselves left behind, but if they move too early, they may waste effort and resources on technology that is not yet ready for prime time.
Recently the German Research Center for Artificial Intelligence (DFKI) in Berlin conducted a detailed linguistic examination of two systems: 1) an English>German statistical MT (SMT) engine based on Moses, trained and used by an LSP; and 2) an untrained NMT system. The primary goal was to determine how well the generic NMT system performs and to identify both system types’ specific strengths and weaknesses. The DFKI study found that statistical systems outperform neural when measured using traditional quality metrics such as Bilingual Evaluation Understudy and Meteor. However, a close examination finds that NMT output actually exhibits fewer errors in most linguistic categories (Table 1).
Phenomenon |
Occurrences |
Percentage correct |
|
|
|
NMT |
Moses |
formal address |
138 |
90% |
86% |
genitive |
114 |
92% |
68% |
modal construction |
290 |
94% |
75% |
negation |
101 |
93% |
86% |
passive voice |
109 |
83% |
40% |
predicate adjective |
122 |
81% |
75% |
prepositional phrase |
104 |
81% |
75% |
terminology |
330 |
35% |
68% |
tagging |
145 |
83% |
100% |
Sum/average |
1453 |
89% |
73% |
The results show that even an untrained NMT system outperforms state-of-the-art SMT in many respects, but falls short on tagging. Many developers are experimenting with ways to supplement their systems with rule-based procedures that fix tagging problems, so this difference is likely to disappear. In addition, the neural software also does worse on terminology, but that is not surprising for generic systems. Trained neural engines should see similar performance to SMT for terminology, especially as modules specific to this area improve.
Results of a separate DFKI study also indicate that neural systems outperform SMT — roughly on par with the best rule-based systems — on tests that emphasize the ability to handle grammatical phenomena and are able to generalize in this area based on smaller sets of training data. This difference will be crucial as MT uptake increases for “complex” languages such as Hungarian and Hindi where SMT has traditionally struggled.
The net finding in the DFKI studies is that NMT does indeed represent a significant advance in the state of machine translation technology. It is becoming usable for situations and languages where rule-based and statistical approaches were not suitable. The differences will only become more pronounced as developers create new auxiliary modules to handle formatting and terminology, two areas that may themselves benefit from separate neural approaches. As NMT combines these modules with automatic content enrichment (ACE) and improved project management systems into an augmented translation approach, CSA Research predicts that the industry will see the development of a continuum of service from “pure” MT to pure human translation, with every grade in between.
The challenge for LSPs will be to find ways to differentiate these services, market them, and monetize them. Unfortunately, MT developers have issued press releases that excited business journalists seize on to write excited articles about the imminent collapse of the language barrier. Accordingly, some buyers expect the impossible in the MT software they can buy today or assume that free, online services will be good enough to publish without any human intervention. The very real improvements in neural technology muddy the water, but over time the current hype will subside and more mature models will prevail.
Despite hyperbolic claims, NMT is not about to replace humans. It surely does not understand or learn language like humans. Prominent pundits see the current boom in this area as the harbinger of an era in which artificial super intelligences will take over and see humans as irrelevant, but realistic AI researchers still do not know how to create such Terminator-style entities. Today it is possible to build a system that can drive a car or play chess, but not a chess-playing computer that also drives a car, much less that can translate. Machines may perform specific tasks well, but they lack understanding of the world and cannot transfer knowledge laterally between domains. As a result, MT will remain dependent on human translation for its training data for the foreseeable future. In effect, MT will be a highly effective and useful parrot rather than something that can think and translate like a professional linguist.
Nevertheless, NMT will increasingly take on low-end tasks on the margins of the language industry. It will handle routine text that is highly similar to previously translated materials or that would otherwise remain untranslated. The translators of tomorrow will have more in common with skilled industrial engineers than with today’s linguists, who operate in a craft-driven model. They will wield an array of technologies that amplify their ability and they will be able to focus on those aspects that require human intelligence and understanding, while leaving routine tasks to MT.
The results of CSA Research’s and DFKI’s examinations of NMT suggest that it is more than a passing fad, but instead represents a major breakthrough that LSPs and their clients should actively investigate. Those that wait will find themselves at a disadvantage. However, they should look for places where it makes sense to add NMT and other technologies without disrupting their current operations. They should not expect an overnight miracle, but instead prepare for a rising tide that will eventually reach most corners of the language industry. CSA Research predicts that the range of what MT can effectively address will increase, but it will also drive demand for human language services and help prevent the erosion in effective hourly rates many translators have experienced. And finally, realize that for machines, input quality matters even more than for humans. The old rule about “garbage in, garbage out” still holds.
Note: Portions of this research were supported by the EC’s Horizon 2020 research and innovation programme under grant agreements number 645452 (QT21).