Enterprise Innovators: Engine agnostic machine translation at PayPal

PayPal is the leading global online payment company. Founded in December 1998, PayPal has 123 million active registered accounts and is available in 190 markets. PayPal has localized marketing websites in 81 regions around the world. PayPal’s Rubén Rodríguez de la Fuente is one of PayPal’s two machine translation (MT) specialists and a member of the globalization technology team. He has been in the language industry for over a decade and works out of PayPal’s office in Madrid.

Thicke: What is your background and how did you end up in the role of MT specialist for PayPal?

Rodríguez de la Fuente: I have a bachelor’s degree in translation and I’ve been working mostly as a translator or project manager. I worked for a while in software localization and that’s what aroused my interest in all things technical. I used to be as skeptical as the next person regarding MT, but when I heard about TAUS, I thought if big companies were pooling linguistic assets and know-how, maybe they would succeed, so I started educating myself on the subject.

Shortly after, MT was introduced at PayPal for a few languages. As a linguist, I was responsible for customizing and maintaining the user dictionary for English into Spanish. Eventually, we increased the number of MT languages and the first MT specialist, my colleague Olga Loskutova, needed some help. Since I had a personal interest in the subject, I seemed to be a good fit.

Thicke: What in particular interests you about MT?

Rodríguez de la Fuente: I have become very interested in the past few years about natural language processing in general. Computers will always fall short of perfection in the task of handling human language, but what they can achieve is nevertheless mind-blowing, like IBM’s Watson beating Jeopardy! champions or performing sentiment analysis on Twitter.

Specifically about MT, apart from its potential as a tool for translators, I’m very interested in use cases where it can help when human translators are not available, like the way it was used in the earthquake in Haiti a few years ago. It will not be as useful as a human translator, but it will nevertheless be helpful.

Thicke: What critical issues for PayPal does MT address?

Rodríguez de la Fuente: There are several ways MT helps us. Cost reduction is one of them, although not necessarily the most important. PayPal works to meet aggressive deadlines and does simultaneous shipping for all languages. Our motivation to use MT was to achieve timely deliveries with the highest possible level of quality.

Within a traditional localization workflow, MT allows us to reduce time-to-market and also to enforce terminology and consistent style and that reduces our in-house linguists’ workload when we’re doing quality assurance (QA) for outsourced translations.

We are also trying to find creative ways to use MT, like having our visual designers translate their prototypes using MT to make sure there won’t be text expansion issues later or performing source content review to flag translatability issues (like concatenation/sentence fragmentation). MT is a very good tool to ensure localization friendliness.

Finally, we encourage teams outside localization to leverage our MT systems for their particular purposes and we assist as needed to make sure their particular requirements are met. 

Thicke: How long have you been working with MT?

Rodríguez de la Fuente: At PayPal, we have been working with MT for three years — two of them in a production environment.

Thicke: You are known for being engine agnostic, for using a variety of MT engines — rule-based (RBMT), statistical (SMT) and hybrid. Why is this?

Rodríguez de la Fuente: We started with RBMT and things went reasonably well. Then we needed language pairs (Nordics) for which the existing RBMT technology did not meet our needs, so we had to go for SMT for those languages. Eventually, we ended up having a technology pool including RBMT, SMT and hybrid. It’s only SMT and hybrid these days, since we upgraded our RBMT systems to hybrid, and we’ve seen that every technology has its pros and cons. Also, we’ve seen that every language pair is a different story with different requirements and needs. So I find the best is to get acquainted with all technologies and decide on a case-by-case basis.

Thicke: What is your criteria for deciding which approach to adopt for which languages and content?

Rodríguez de la Fuente: We have identified two main indicators for evaluating and selecting MT engines: linguistic quality and ease of integration with the existing computer-assisted translation tools and workflows. Often, all the focus is set on the first indicator, but I’d say they are both equally important: an engine producing great linguistic quality but with a cumbersome integration with existing tools will be of little use. In terms of ease of integration, the engine must support seamless communication with your translation management system via the application programming interface or something similar. As for linguistic quality, it is important to be systematic and go beyond showing a sample to an in-house linguist or an external vendor and asking for feedback.

Thicke: What languages have you built MT engines for?

Rodríguez de la Fuente: We currently have MT engines in place for French (both European and Canadian), Italian, German, Spanish (both European and Mexican), Russian, Danish, Norwegian, Swedish and Simplified Chinese. We also have a normalization engine converting US English to British and Australian English.

Thicke: Does your source content pose any challenges with MT?

Rodríguez de la Fuente: Our source content for MT is mainly dynamic HTML, using variables that are replaced by corresponding values at run-time. The challenge of tags for MT is two-fold. First, tags can interfere with the parsing of the sentence, resulting in faulty output, and second, they might require adjustment of surrounding text or reordering depending on their value.

Thicke: How do you define MT quality?

Rodríguez de la Fuente: Linguistic quality does not mean the MT output will be flawless, but rather that it will take you less time to post-edit the MT output than to translate from scratch. Tentatively, post-editing throughput should be 5,000 words a day or more.

Thicke: What languages do you use with an RBMT engine?

Rodríguez de la Fuente: We started Russian, Spanish, French, Italian and German with RBMT, but they have all been upgraded to hybrid now.

Thicke: RBMT is known for managing tags better than SMT. Do you agree? How do you handle tags in RBMT?

Rodríguez de la Fuente: In order to handle the tags, we have included them in the user dictionary with the appropriate part of speech. The RBMT engine takes care of adjusting the surrounding text and reordering the tags if needed.

Thicke: How do you evaluate RBMT quality?

Rodríguez de la Fuente: To ensure good performance of the engine, we keep track of the edit distance (the amount of rework needed to bring the raw MT output to publishing quality) and the review distance (the rework of translation from the vendor by our in-house linguists). The metric used is WER (word error rate) since we find it more intuitive than BLEU (bilingual evaluation understudy).

Thicke: What are the main advantages of RBMT?

Rodríguez de la Fuente: The main advantages of RBMT in our experience are the ease of customization (linguists can start playing around with the tool after a few hours of training; user dictionaries can be amended on the fly to fix errors); the good handling of tags; and the predictability — with a basic understanding of the tool, you know what kind of output you can expect.

Thicke: And the main disadvantages with RBMT engines?

Rodríguez de la Fuente: The main disadvantages are that they require more manual work — about three hours a week per language on average — and that the output will be accurate and grammatically correct, but sometimes not very fluent. They are also more expensive than SMT engines.

Thicke: What SMT technology did you choose, and did you take a do-it-yourself approach?

Rodríguez de la Fuente: The technology is based on open-source toolkit Moses. The idea of using Moses out-of-the-box and avoiding vendor costs can be tempting. However, Moses is rather complex and vendors have customized it further for better results with language pairs other than the ones the system initially supported.

Thicke: What is the main advantage of SMT?

Rodríguez de la Fuente: What makes SMT engines really appealing is that they require no customization work on the part of linguists: the engine learns by itself through statistical analysis of translation memory (TM) corpora. Therefore, a lot of effort in terms of manual work and training is saved, and the engine can be ready in a matter of days. They are also generally cheaper than RBMT engines.

SMT will deliver good results for language pairs in which the target does not have very rich morphology features, such as in Danish, Norwegian and Swedish. For more complex languages such as Russian and German, it is worthwhile to invest in hybrid systems. RBMT can deliver good results provided customization is done on a regular basis, but it seems to be less efficient than the other two types of technology.

Thicke: And the main disadvantage of SMT engines?

Rodríguez de la Fuente: The drawback of SMT is that for the time being it is only capable of keeping the tags in place without reordering or adjusting of surrounding text. This is not a limitation of SMT itself, but rather of the TM corpora that replace tag content (such as <result>countryName</result>) with numeric placeholders (such as {1}), preventing the engine from learning how the tags need to be adapted in the translation. We are currently considering the feasibility of using XLIFF files (which do contain tag content) instead of TMX files as training corpora.

Another limitation is that it is only efficient to retrain the engines every three to six months, so they are not as flexible to incorporate quick terminology fixes as RBMT. In terms of linguistic quality, the most frequent issues relate to wrong word forms, capitalization and punctuation.

A word of caution about SMT: the quality of the output is going to be only as good as the quality of the translations in your TM. If you are concerned that your TM might have terminology inconsistencies or mistranslations, it is best to do some QA on it before starting the engine training.

Thicke: Finally, with respect to the hybrid approach, what is your experience there?

Rodríguez de la Fuente: We first started using the hybrid for German and Russian. The edit distance for German and Russian is a bit high due to the rich morphology of these languages (case inflection) and some heavy locale-specific customization requirements. A roll-out to the hybrid for Spanish, French and Italian took place a few months ago. Edit distance has been reduced on average a few percentage points (2%-4%).

Thicke: And does your hybrid engine use RBMT or SMT as the baseline?

Rodríguez de la Fuente: It uses RBMT output as baseline, and then refines it through comparison against a language model created with SMT techniques.

Thicke: What kind of results do you get with the hybrid?

Rodríguez de la Fuente: Initial testing of the hybrid output shows that it is generally more fluent and natural than RBMT. However, some typical mistakes of the hybrid include deprecated terminology due to outdated translations present in the TM, part-of-speech agreement mistakes, extra words in translation, extra punctuation and wrong capitalization. This results from unpredictable behavior of the statistical component, and we’ve seen these issues reduced in German and Russian after re-training the engine on cleaner data. In spite of these drawbacks, edit distance should be lower overall.

Thicke: Have you tried taking off the SMT processing for German and Russian?

Rodríguez de la Fuente: We started using the hybrid for these locales because RBMT was not meeting our requirements, so I don’t think disabling the statistical component would improve things. An alternative approach would be to fall back to RBMT (which is more predictable and consistent) and use search and replace scripts for automated post-editing of known error patterns. I’m not sure how much improvement that would bring compared with the hybrid, but it’s an idea we have in mind.

Thicke: PayPal is one of the few companies that has a truly engine-agnostic approach to MT. What conclusions have you drawn?

Rodríguez de la Fuente: A good MT strategy should be technology-agnostic and look for the most efficient solution on a case-by-case basis. The type of technology that best suits your needs will change depending on the language pair. The organization and resources of your company, in terms of headcount, know-how and existing linguist assets, are also important factors to consider.