When machine translation (MT) was in its infancy, just being able to deploy one MT engine seemed accomplishment enough for an enterprise. As a result, the MT debate centered around which engine to choose: a rule-based (RBMT), statistical (SMT) or hybrid engine? Claims and counter claims were made, but “bake-off” contests were unable to resolve them once and for all: one engine would win for French, another for Japanese, a third for Russian — and no one was any closer to answering the question “What engine should we use?”
This uncertainty about which approach to adopt had the result of keeping many a company sitting on the fence, unwilling to deploy MT until they knew for sure which system to invest in. Now that the industry has grown more mature, we know that with machine translation there is no one-size-fits-all. MT performance is much more nuanced than that.
In this regard, MT technology is not the same as translation memory (TM) technology. Both MT and TM are productivity enhancers, and in an ideal situation, both work together. But while you can choose a single TM tool with no major consequences, MT is a little more complex. The performance you get is dependent on factors such as content type and language pairs, or even the direction of the language pair. A deep expertise in a range of engines is required to know when a particular approach will excel.
No single approach — SMT, RBMT or hybrid — suits all content types, all projects and all languages, so getting MT right requires an open mind and a pragmatic approach. It requires testing and benchmarking and using quality metrics to identify the best-of-breed engine for a given context. It requires an evidence-based, technology agnostic approach.
MT is a process, not a tool
In the early days of MT, the belief was that you chose your tool, then off you went. But after nearly a decade of deployments in full production environments, LexWorks — the MT services branch of Lexcelera — has concluded that MT is about much more than a tool. MT is a full industrial process involving tools, content, workflow, integration and more. And part of that process is identifying best-of-breed performance wherever you use MT.
Best-of-breed solutions can be identified by applying a set of practical guidelines based on factors such as language combination, content type, file formats and available data as well as by rigorously benchmarking engine performance at launch. For example, in the LexWorks process, assumptions are tested by benchmarking on a variety of objective quality metrics such as BLEU, GTM and SymEval, as well as more subjective criteria such as human sentiment analysis, understandability measures and, in the case of online customer support, answers to the question: “Did this solve your problem”? All of these measures ensure that you’re working with the best-of-breed solution.
“A good MT strategy should be technology-agnostic and look for the most efficient solution on a case-by-case basis,” according to Rubén Rodríguez de la Fuente of PayPal. PayPal is one of a small group of pioneering enterprises that today form the leaders in the MT space.
Companies such as PayPal, Symantec, Autodesk and Adobe all started by piloting a single engine; when they ran up against performance or integration barriers, they piloted others. And at the end of the day, these industry leaders discovered that there isn’t one best engine for every situation, that the same engine that performs well in Spanish or French may not perform well in Japanese or Russian, and that one approach suits well-structured content but not necessarily user-generated content. These companies care deeply about quality and they use whatever engine will best get them there.
The MT process favored today by many leaders creates a workflow that streams content into a panel of engines, depending on the use case. In Symantec’s process, for example, the technology integrated into their workflow includes RBMT, SMT and hybrid engines: Systran, Microsoft Translator Hub, PROMT and Moses.
When we talk about MT, we often talk about the concept of “good enough.” Some may wonder why it’s important to strive for MT quality at all. Given that post-editors will improve it to the level needed, regardless of the starting point, is it even worth the trouble to get MT right? We believe it is for two reasons. One is the availability and willingness of post-editors, and the second is cost.
Accepting MT that is “good enough” leaves post-editors to pay the price for substandard MT. My personal belief is that it’s time to put quality back on the table. For one thing, it’s hard to recruit post-editors when they receive amateurish MT and are expected to fix it — for a discount! But good MT is a win-win, because as quality improves, so does post-editing productivity. This is a significant metric because post-editing expenses are by far the lion’s share of a localization project’s cost structure. Even considering training, on an initial MT project (in this example, half a million words), post-editing is 80% of a project cost (Figure 1). Once the engine has been trained, this proportion grows to 90% (Figure 2) and in further iterations is even higher. So improving post-editing productivity is the most effective way to reduce costs.
Case studies
How does it work to be technology agnostic in practice? While there are general rules of thumb — for example, that SMT works better with user-generated content — LexWorks has found it valuable to test assumptions pretty extensively. We generally test at least three engines before starting any long-term project, whether for a customer support site or documentation in 17 languages. To give some real-world examples of our findings, here are some case studies showing why we chose one engine over another.
Factory, Novocherkassk, Russia
Challenge: The challenge of this three-year project was to translate two to 200 pages of English to Russian each and every day. The content was mainly technical specifications and contracts.
Constraints: There was no bilingual data available at project start to train engines.
Solution: RBMT. Without data to train an SMT engine, a rules-based engine was the de facto choice. In any case, we often pair an RBMT or hybrid engine with Russian, as it is a morphologically complex language.
eDiscovery
Challenge: To translate 30,000 pages, mostly e-mails, technical reports and meeting minutes from Japanese to English in order to identify information that could be considered a smoking gun.
Constraints: Content was written with little attention to grammar or spelling, and was highly colloquial.
Solution: Hybrid. We chose a hybrid engine because SMT works best with grammatically incorrect and colloquial sentences, while RBMT tends to perform best in the Japanese-English pair.
Response to a technical RFP
Challenge: To translate 3,400 pages in one week from French and English to Brazilian Portuguese for a response to a request for proposal (RFP).
Constraints: Limited data at project start, and limited time for training. The content came in many different files, and there were multiple passes on each file as the customer rewrote while the translation process was going on.
Solution: Hybrid. The SMT component of the hybrid was helpful in allowing us to input TMs as training material and also adapt to changing source text. The RBMT component allowed us to enter key terminology and to save on post-editing time.
Online customer support site
Challenge: To make dynamic content available on a customer support website in nine languages in order to solve customer issues before they became a call to the help desk.
Constraints: Extremely colloquial user-generated content with little attention to correct grammar and spelling, extensive use of abbreviations and content unlike what is found in product documentation. The server needed 24/7 uptime.
Solution: Online SMT. To ensure the system was trained in enough in- domain/out-of-domain material (including sentence constructs not found in user documentation) and also available online 24/7, we chose the Microsoft Translator Hub widget. The widget was customized with product names, Do Not Translates and the results of post-editing spot checks.
Self-service MT server
Challenge: One of the top five banks in the world needed an MT system behind their firewall so that their employees would not send sensitive information out to Google for translation.
Constraints: This customer has many different business units such as investment, insurance, construction and automotive leasing, with sometimes competing terminology.
Solution: Hybrid. To manage the very domain-specific terms we needed an RBMT engine to organize and rank terminology by business unit. Having a large amount of bilingual corpora to train the statistical component enabled us to choose a hybrid engine that offers both; the hybrid server is now on the client’s premises, which means we are able to maintain and update remotely.
Questions to ask
To determine the best engine, some good questions to ask are: what kind of content do I need to translate? Is it well-authored or user generated? What language pairs do I need? What quality level? Will there be post-editors assigned to this work? What internal and external resources are available (for example, linguists, engineers)? Is there enough in-domain/out-of-domain data to train an engine?
All approaches — SMT, RBMT and hybrid — perform well as long as they are matched with the right content. But just as you would not touch up a photo with Framemaker or lay out a technical manual with Photoshop, superior performance comes from matching the engine to the content. Continuing to maintain and update the chosen engine over time will improve quality even more. Quality is back on the table again because sometimes “good enough” just isn’t.
Part two of this article will appear soon in an upcoming issue of Multilingual.