Producing custom translations with machine translation
By Rick Woyde
As neural machine translation (MT) has matured, software developers and users alike are looking for new innovative ways to produce better machine translations. For all the broken promises of “human quality” — except in the easiest of texts — MT has always lacked the ability for users to easily improve and orchestrate the quality they receive. It’s pretty much always been a “you get what you get” experience for most MT users.
While artificial intelligence (AI) has made its way into MT, machine learning has been limited by several factors (until recently), including: the technology, the complexity of translation itself (getting high-quality translation at scale is complex), and the need to amass millions of words of bilingual content to train an MT engine. This last requirement alone eliminates 99% of users of MT software, as very few organizations think their previous translations have value or centrally store all (or most) of their documents and files in one place. Many organizations simply don’t have enough translated content to train an MT engine.
Fortunately, that’s all changing. MT software is maturing and is in a period of refinement, turning its attention to the details that separate a good translation from a poor one.
Generic MT: perfect for beginners
In many cases, out-of-the-box commercial MT software performs quite well. Users can easily get decent translations that meet their needs at a very low cost in hundreds of languages. These solutions offer speed and cost savings that no human can match. These solutions are great for people who don’t require perfect translation quality and can disregard the occasional translation gaffe or return gibberish.
While human translators are taught to chase the holy grail of translation quality and perform a craft when a commodity is most often needed, MT users are demonstrating that there are many use cases where perfect translation isn’t needed. For these users, time, cost, and other factors matter more. I know that’s a tough pill for translators to swallow. But it is what it is. MT adaptation will continue to grow whether translators like it or not.
MT + translation memory: the next step
For more savvy users that require better quality and are willing to invest some of their own time improving translations, there’s MT + translation memory. Commercially, translation memory has been a core component in the translation process far longer than MT. However, only a fraction of organizations that require translation use translation memory. Implementing translation memory into your workflow requires you to use additional software. You can’t create or access a translation memory in MSWord or Excel and even today — more than 40 years from the inception of translation memory — many organizations still have never even heard of it.
As probably everyone reading this knows, translation memories are just bilingual databases that store text and their translations for future re-use. They’re really good for publications that contain repetitive text with long shelf lives, like owner’s manuals, service manuals (really any kind of manual), software strings and HR publications. It’s not as helpful for advertising messages or short-life cycle materials like website banners.
For organizations that don’t produce the content being translated, translation memories won’t help you much. For organizations that do produce their own content, storing and reusing translations is a no-brainer. It will likely save you time and money and produce more consistent translations and you can use it as the foundation for your MT strategy. Combining your translation memories with MT in your translation process will produce better translation results and requires little more effort than exporting and importing a .tmx file. Most solutions will also allow you to upload a glossary or term base.
In this scenario, translators have to perform translation post-editing. It’s a laborious process of reading and editing MT that few translators seem to enjoy. Post-editing MT is substantially different from post-editing human translations. While human translators produce random mistakes, MT produces repetitive mistakes. This requires post-editors to identify the repetitive mistakes and correct them across the whole system. It’s a more systematic activity and would be better done by a linguistic engineer than a translator.
MT + machine learning: on the precipice of human quality
Here we go again, another MT professional advocating that MT can produce almost human quality translations. It’s a valid critique — but hear me out.
It’s never been easier to get near human-quality translation using MT.
As I wrote earlier, until recently, the only way to produce custom translations was to pre-train an MT engine with previously translated content stored in a translation memory. If that wasn’t enough, you needed millions of words of content for the translation software to “learn” and improve translation quality. Insurmountable hurdles for most companies. Most companies do not have a translation strategy — instead, they have translation needs that are managed by busy managers with lots of other tasks on their plate.
Today, with little more than a glossary — optimized for MT, of course — users can produce MT that really does approach human quality. Some translation applications can pull terms from a glossary during the translation process and populate all its user’s translations in real-time. Reducing translation mistakes significantly. Users can produce almost human translations with little effort and time spent.
An article about MT wouldn’t be complete without talking about quality. It seems we often forget that human translation is far from perfect. As I’ve mentioned earlier, translators make mistakes too. I don’t know a single perfect translator out there, and I’ve worked with many great translators. But none of them were perfect.
Human translators make random mistakes — what we often call “human” mistakes. Sometimes they forget to capitalize a word, or they omit a sentence or paragraph. They make spelling errors. They forget to check words in a term base. I could go on, but I think you get my point.
MT also makes mistakes. Laughable mistakes in some cases. The good news is that they make the same mistakes over and over again. While human translators can be inconsistent in their mistakes, MT software is almost entirely predictable.
Unlike human translators, they don’t make such random and inconsistent mistakes. These repeated mistakes are powered by patterns. When you view the mistakes made by MT as patterns, you can correct those patterns and correct many mistakes at once. You can analyze those patterns and identify their root cause. In most cases, most MT mistakes are caused by incorrect terminology. At least half of those are just human or organizational preference. In fact, it’s not entirely fair to call them mistakes — it’s more a lack of knowledge, like not knowing that I like steak fries more than crinkle cut fries.
Custom translations: pre-editing versus post-editing with MT
When it comes to MT, users have more choices on how to create custom translations. In the past, MT users had only one choice with unreachable requirements for most as it required millions of words of translated content, took place before you started translating, and took hours to complete. Training neural translation systems was only available to those organizations with a lot of multilingual content. It also requires some language engineering capabilities, and as these are learning systems, sometimes they learn the wrong meaning or terms and actually degrade translation quality instead of improving it.
At this point in time, we’re in an MT post-editing (MTPE) environment. Organizations typically use highly paid bilingual staff members or translators to post-edit MT. While translation companies attempt to retrain translators to the new post-editing MT process. That’s the most common MT-based workflow used to produce high-quality translations at scale. What if there’s a better way?
Today you can influence or pre-edit machine translations during the translation process, instead of receiving generic MT output. MT software is evolving into a dynamic machine learning environment to produce custom translations. Users can now train MT engines on the fly as they translate. There’s flexibility too. There are multiple ways to train an MT engine: You can push and pull translations to a translation memory, create glossaries, and edit translations in the translation software. Your MT software will use these assets to continuously deliver improved translations. More than anything, an MT-optimized glossary alone will improve user translation quality dramatically and users will see their post-editing load reduced to mostly word choices.
As MT technology continues to improve, we’re inevitably heading toward a future in which more and more translations are being produced using MT and substantially less MTPE will be required. Where users can improve translation algorithms in real-time potentially improving MT to almost human translation and more users will be able to create custom translations with less effort than today.
Rick Woyde is the chief technology officer at Pairaphrase.
Although machine translation already has a big role to play in other areas of the field, literary translators haven’t quite adopted it so readily —…→ Continue Reading
Subscribe to stay updated between magazine issues.