How NMT is Revolutionizing Subtitling

Evgeny Matusov

Evgeny Matusov is currently the lead science architect of machine translation at AppTek, after holding the position of director of machine translation research at eBay. He received his diploma degree in computer science from RWTH Aachen University in 2003, and a PhD in Computer Science from Aachen University in 2009.

Advances in neural machine translation (NMT) technology have media and subtitling companies now relying on it to assist translators in post-editing workflows. However, while NMT certainly offers a productivity boost, cost and resource utilization benefits are still constrained by the limits of most popular machine translation (MT) systems.

That’s because off-the-shelf MT systems function rather like a black box ― you input data, and the system outputs the translation in running text format, which translators then turn into subtitles by incorporating corrections and text segmentation as needed. Fewer MT errors and faster turn-around can be expected when an off-the-shelf solution is replaced by a NMT system customized to the media and entertainment domain using available in-domain data from this industry vertical. Such customized systems may also include components for automatically outputting semantically and syntactically segmented translations in a proper subtitling format.

Customization is needed to significantly increase the quality of the MT output, producing texts that are more like the ones that need translating. It is with such customization that one may successfully implement MT in the creative industries — something that was unimaginable not so long ago.

But from a language service provider’s point of view, if you need high-quality MT for ten different domains or genres, it means that you would need to train, deploy, and maintain ten systems — even though they may be idle some of the time. The result is a high environmental footprint and increased costs. There is also a risk of “over-fitting” the training, making it so precise to a particular domain that its performance is actually worsened with slightly different data.

 Adding metadata transcends translation limitations

The latest advances in NMT offer a more productive and cost-effective approach. The technology has reached not only much higher quality, especially for in-demand, high-resourced language pairs, but also offers flexibility not possible in previous NMT generations. A single NMT system can now be easily customized by inputting minimal additional metadata relevant to each unique domain and scenario that a business requires. The concept is similar to an acoustic mixer that allows you to modify the same sound in various ways. With NMT, the user simply adds an extra parameter value in an API call that generates a desired translation; for example style=formal or length=short. 

Specific metadata attributes that can be customized include:

  • Style, such as formal vs. informal language, which is often context-dependent. An especially tricky translation to this point has been the English “you” (both singular and plural) which is more uniquely distinguished in some other languages. Advanced NMT can now choose the right translation and adjust sentence structure if necessary, including:
  • Speaker gender, especially given that not all languages treat gender pronouns the same way.
  • Domain or genre, such as news, patents, talks, entertainment, and the like.
  • Topic, catering to more specific document-level style and terminology differences.
  • Length, generating shorter or longer translations with minimal information loss or distortion.
  • Language variety, where parallel training data for related languages or dialects can be combined in a single system, such Castilian and Latin American Spanish, Canadian and European French, and others.
  • Extended context, assessing whether or not the context of the previous or the next source sentences should influence the translation of a given sentence.
  • Glossaries, relating to terms with official or mandatory translations, which the system may otherwise translate differently.


Setting parameters for neural machine translation, such as whether the text is addressing one person or many people.

The metadata can come from a variety of sources, including information about the origin of a translated document. It can also be computed from the text data itself using rules and regular expressions, or predicted through separate machine learning algorithms.

Implementing this approach in MT system training allows you to train single models, reducing both environmental footprint and cost at the same time. Switches for various captured metadata can be made available through APIs, so that translator interfaces and platforms can leverage such flexibility and offer UI solutions that facilitate respective workflows.

 Honing accuracy

Metadata customization is not the only way to increase translation accuracy. Regular model retrainings, after feed-back from linguists, also help steer the quality of the output closer to what each business requires. One way to achieve this is with reinforcement learning.

Reinforcement learning is training on post-edited data, which allows the system to learn how to correct its own mistakes, rather than use other translations that may cover additional domains but won’t directly improve translation errors. It is very important for continually maturing the MT platform and improving accuracy. However, since it also poses the risk of over-fitting, customized expert intervention and accompanying knowledge transfer is needed in the system training and deployment process.

Of course translation accuracy is a key concern; predicting it is even harder. It’s possible today to output quality values that estimate how sure the MT system is about a translation. Those confidence scores can then be used in translation management systems to specify thresh-olds for routing documents to different translation workflows: to a light or full post-editing workflow, or a classic translation, editing, proofreading workflow, for a translator to edit or translate from scratch.

 What does this mean?

These NMT-driven advances can deliver significant benefits to media clients, language service providers, and the post-editors and subtitlers who shoulder the high-pressure responsibility of accurate subtitling.

Having proper data regarding style and length, for example, means fewer words and sentences would need to be post-edited. By flipping the switch to the desired metadata, translators can experiment with the technology and get a new tool to make their daily work easier. By applying confidence measures, a translator can quickly go through easier translation segments and put more energy into harder ones. What’s more, the number of easier segments increases over time due to machine learning.

A smart user interface to take advantage of the full flexibility offered by MT systems today could easily solve a lot of the issues post-editors are currently reporting in production. The mission is to deliver organizations greater control over whatever subtitling output is given back to them. Ultimately, it’s all about better business performance and results.