Confidently implementing MT for eCommerce

By Wayne Bourland and Deepak Nagabhushana December 11, 2014

In today’s cloud-based world, many companies are changing their web content daily or even multiple times per day. The rapidity of content change is stretching marketing and go-to-market teams even in the source language, and often breaks down when publishing regionally. Confounding the issue, online consumers have constantly evolving needs and expect relevant, even personalized content. When they don’t find it one place, they are willing to go elsewhere. Brand loyalty takes a back seat to the want-it-now mentality.

Due to this, the desire of sales makers to harness the global reach of the internet is at odds with the tried and true human translation workflow systems most companies have historically used. Even so, marketing and localization teams are hesitant to trust machine translation (MT) to deliver expected quality, even though they understand the time advantage. BLEU scores and its ilk are meaningless when seeking to gain confidence in automation — rather, forward-looking localization leaders have to speak the language of eCommerce.

The true potential of MT can be demonstrated using web analytics and A/B testing, which looks at two statistical variants in a controlled experiment. You need to present consideration and conversion metrics in terms that marketers and sales makers understand. A/B testing, or test and target, gives marketers website optimization capabilities to continually focus their online content and offers more relevancy to their customers, yielding greater conversion.

For our purposes, designing a test setup is fairly simple. First, you decide which language you want to test and have the website content translated both through the traditional human translation process and by MT — likely post-edited MT. Second, you use test and target to send 50% of the visitors to the human translation content and the other 50% of the visitors to the post-edited MT content. At the same time, you run a website survey for the visitors who get to see machine translated content, and you collect their feedback. Thirdly, you analyze the site traffic to each set, looking at click through, click away and conversion for each.

Now that you have meaningful data, it’s time to build the business case. We already know the post-edited MT process will be cheaper — how much so depends on a realistic judgment of how good the quality must be — and we know that the process should be quicker, or at least it will be after post-editors get through the learning curve. So now we have to weigh those advantages against the test outcomes.

From testing to action

At Dell, we found that the key metrics between the test sets were essentially equal, meaning both types of translation resulted in roughly the same customer behavior. Your results my vary, but the more tech-savvy your customers, the more likely it is that they will prefer function over form. Even if post-edited MT is at a slight disadvantage, you have to consider that if the speed and cost advantages allow you to localize more or get to market quicker, this will increase your revenue even with a slightly lower conversion rate. Additionally, you can always go back and improve the translation for higher volume products.

After proving the benefits to ourselves, we threw caution to the wind and moved French and German fully to post-edited MT. We stood back and waited for the world to end, but after two quarters of holding our breath with not a single escalation or complaint, we set an aggressive strategy and moved Dell.com product content to post-edited MT for all 27 languages (now 28) within a year’s time. It’s a great success story, but it’s not as simple as it sounds, and there was a lot of groundwork to be laid first. Let’s look at some of the steps and challenges to making this successful on an enterprise scale.

First, yes, there are tradeoffs. Now is the point when everyone will say “see, I knew it!” MT technology has improved greatly over the past decade, but it still falls short of human translation quality, and some languages are just downright difficult for automated solutions. However, we weren’t forced to trade quality. We traded savings, accepting a very small post-editing discount as the engines ramped up and the translators made the transition. A couple of years into the program we are averaging about a 20% discount on new words, and while that may be small against what people expect to get in MT discounts, 20% of 60% (the percentage of new words we’d be translating from scratch) of our multi-million annual word volumes still adds up to a significant cost savings.

In addition, we have seen our average turnaround time drop by more than 50%. That improved speed to publish is immeasurable in terms of revenue potential, but certainly garners a great deal of goodwill internally for a process that inevitably gets pegged as a choke point for content release. It’s unlikely we will gain much more in the way of cost improvements — it is eCommerce, and while function may be more important than form in certain industries, there is still an expectation of quality that can’t be ignored. Nobody wants to buy something from a company with shoddy content. Consumers around the globe understand and accept that content is translated, but they still expect a certain level of dedication and investment in the local market. Sadly, we won’t be putting raw MT on the site anytime soon, at least not outside of support content.

One way to mitigate the potential quality tradeoff is choosing a strong MT provider. All of our vendors leverage the same MT solution when performing post-editing. We made a number of stops along the way before finding the right fit. There is a large cadre of MT providers that do well with a few languages, and you may find yourself having to aggregate several solutions to meet all of your language needs. Many companies do this with internal linguists setting up and maintaining engines, but we believe that solution often leads to a disjointed approach. We took the route of having one of our language service providers (LSPs) source and test a number of technologies and finally settled on a single technology provider that is able to develop engines for all of our required languages. This greatly reduces management overhead on our part, and vastly simplifies tool integration.

Even with the right technology provider chosen, there are still a number of hurdles to traverse: how you integrate the tools; how you set up the translation memory (TM) sequence for optimal performance; and determining if you’re going to measure MT quality the same way as your human translation workflows, to name a few. Tool integration could make for an article or perhaps even an entire issue by itself, and is often dictated more by IT constraints and legacy tools to work around than it is by dollars or know-how. We were able to connect our MT tool through a simple application programming interface (API) to our translation management system (TMS), which is already integrated with key content management systems (CMSs). Content flows from the CMS into TMS, out for MT, back for post-editing and back into the CMS with no manual steps outside of the actual post-editing itself.

Integration costs were minimal, and oftentimes we could convince our partners to help shoulder the cost since more often than not, the API would benefit other clients. Even without help, it doesn’t take many man hours to justify an integration over manual workflow steps. On the plus side, once the integration is enabled, you can plug the MT process into workflows as needed, leaving off post-editing when quality expectations allow. It becomes just another workflow stage, like vendor authorization or applying TM.

Because we demand a fairly high quality product from post-editing, we have the confidence to put the MT TM into the sequence, where more often than not, localization teams make the decision not to leverage anything resulting from an MT workflow. Our TM sequence goes something like this: first the primary TM, then the secondary TM — the split between primary and secondary is more about managing file sizes and making TM cleaning easier than it is about a difference in the content in each — and lastly the MT TM, with no penalty. Of course, we only leverage the MT TM for the MT workflow. We owe it to ourselves to analyze what percentage of the leverage is coming from which TM, but that’s a task for later.

Quality concerns

Quality is such a polarizing topic that we saved it for last. Perhaps we are being presumptive, or maybe it’s just wishful thinking, but we will speak from the premise that everyone accepts that translation quality is no longer defined as error-free. Our early and misguided quest for error-free quality wasn’t even achievable in the days of human translation with review stages, correction stages and spot check stages — and who has the time or money for that anymore anyway? While surprisingly we met little to no resistance on the client side when rolling out MT, we did have to transition our translator workforce to the new paradigm. And, of course, when I say we, I really mean our LSPs.

It’s no surprise that translators have been resistant to post-editing. It means a significant shift in how you approach your work, and it often comes with increased time pressure and lower rates per word. Again, patience and an open relationship with our partners paid off here. Starting with small, often as low as 5%, discounts, and in many cases no discounts, we allowed translators to ease into the new work style. We also gave them confidence that we were spearheading into a brave new world with them, versus just trying to squeeze them for all they are worth. We focused on measures that demonstrated increased productivity versus relying on automated scores of automated processes. We measured quality the same way as the traditional workflows, in that we looked for linguistic errors and readability of the final product versus paying too much attention to the quality of the MT output alone. At the end of the day, it’s not linguistic analysis, automated MT scores or even the irrational complaints of internal clients that provide a true measure of your quality. Rather it’s the end client who comes to your website and decides to either buy or move on that should determine your quality standard. Customer behavior and acceptance is the hardest of all of these to measure, but it’s worth the effort.

Based on our experience, it appears that MT can be leveraged successfully for eCommerce, and it can be done with confidence. However, it takes a slow and thoughtful approach. We weren’t slow in rolling out MT, but we were slow in our savings expectations. We took the time to prove the value in terms that make sense to our internal clients, and we partnered with our suppliers to make this transition in a way that didn’t alienate the critical success factor in all of this, which is still the translator. Quality is more and more about balancing output with utility, and if this is done successfully, you can leverage MT across your content stack.

So what’s next, and where do we go from here? For us, it’s moving MT further into the pure marketing content realm. It won’t be easy, and it won’t show savings results soon. We plan to start leveraging the MT workflow with no discount to acclimate our supply chain. We also have to gain the confidence of marketing teams, teams that already press us hard for better quality.

The irony is, we already get complaints like “this looks like it was machine translated,” when it’s not, so what’s the harm in using MT in actuality?