Implementing Machine Translation in Subtitling

In the past year, we have witnessed accelerated technology implementation across multiple industry verticals and geographies, as companies try to create efficiencies, productivity increases and cost savings, boosted by the effect of the global COVID-19 pandemic. One of these technologies is machine translation (MT), now being implemented across a wider spectrum of markets, including the creative industries — in particular in subtitling and other audiovisual localization workflows. Some other examples can be found in MultiLinguals just-released March/April issue covering games and multimedia, which can be found here.

To a language service provider, machine translation will sound very appealing in terms of cutting down turnaround times and cost. But getting the right technology to suit one’s needs is only half the story — implementing it is the other half. MT-augmented workflows don’t just require solid MT performance on the backend, but creating an appropriate, user-centric design, if the user is to benefit and be ‘augmented’ by the technology. The way the translator interacts with MT often makes all the difference with respect to the success of the implementation.  


Users don’t care about the technology; they care about the product. It has been pointed out before, by myself and others, that although automation has progressed to a state-of-the-art when it comes to the technical aspects of subtitling, very little progress has been made with respect to efficiency tools for the translation part of the workflow, leaving plenty of room for opportunities and innovation. So how can the translator be provided with a handy set of tools that facilitates the subtitling task? 

All the tools a translator needs to resort to, like term bases, spell checkers, and quality assurance (QA) tools, have to be integrated together with the MT in the translator’s regular working environment. Being able to carry out the majority of your work within the same interface facilitates the task and reduces completion time. A European Language Industry Survey (conducted in 2020 by the European Union Association of Translation Companies in collaboration with the European Language Industry Association, the International Federation of Translators/Fédération Internationale des Traducteurs Europe, and the Globalization and Localization Association, and supported by the European Master’s in Translation network and the European Commission’s Language Industry Platform group) shows that translators still call for better integration of computer assisted translation (CAT) tools in their editor environments. QA tools in subtitle editors are mostly concerned with the timing, formatting and readability of the subtitles from a non-linguistic point of view, and spell-checkers are the only real translation aid currently available in most subtitle editors.  

While glossaries and term bases are part of the standard workbench at audiovisual LSPs, such tools are still rarely integrated within the subtitle editor itself. It is a common issue in neural machine translation (NMT) that the system will translate proper names when it doesn’t need to, so proper glossary integration would overcome this problem at a post-processing stage, before the output is seen by a translator.  

When subtitling, translators frequently perform search operations in order to find, for example, how a phrase was translated in a previous episode of a series so as to reuse that translation. Yet the use of translation memories or concordance searches in media localization is virtually non-existent. Audiovisual translators have long asked for CAT tool features in subtitling editors, as this AVTE video interview also highlightsPerhaps the preparation of archived data for MT system customization will provide an opportunity to language service providers to also address this translator pet peeve. 

When introducing new functionality in the user interface, commonly used commands should follow established interactions so that the tool is intuitive and comfortable to work with. The potential of the editing software could be further explored to see if it might make sense to integrate additional functionality for post-editing or review tasks that involve repetitive actions. Customizable hotkey usage has always been one of the main asks of subtitlers, who are admittedly faster when able to use the keyboard exclusively for all their interactions.  

In the subtitling market, such repetitive actions during post-editing might be a result of the MT text layout rather than the contents of the automatic translation. The way MT text is broken into subtitle lines and boxes is not necessarily a feature of the MT itself, but it often makes a significant difference to the amount of post-editing effort required. Having to move text — even if correct — between different subtitle boxes with merge and split operations increases post-editing effort considerably. It is therefore good practice to implement post-processing to reconstruct the MT text in the right format before it is presented to a post-editor.  

Using intelligent text segmenters will reduce the amount of editing that would be required for the creation of perfectly constructed subtitles. In fact, recent research highlights that the top item post-editors complain about when working with automated subtitles is poor text segmentation of the MT output. Some segmentation tools are developed by subtitling software providers themselves, who have had to accommodate workflows that involve the use of scripts to produce subtitle files. Others are developed by MT providers that specialize in the subtitling market and want to ensure their MT output complies best with the expectations of the end users. 

The segmentation aspect of subtitles can be performed as a post-processing activity to either speech recognition or machine translation output. Our team’s small-scale experiments in Latin American Spanish showed the impact such intelligent text segmentation had in reducing subtitle edit rate and post-editing effort 

As we implement MT in subtitling, the user interface needs to be adapted further for the post-editing task. Translators sometimes complain about the MT acting as a block to their creativity, so a lot of thought should be spent on how to best present the MT output to the user. Providing post-editors with the MT output in a box for them to overtype is indicative of poor design that does not focus on the end user. Deeper implementation, creative and empathetic, is needed that aims to put the translator in the driver’s seat. This can be achieved, for example, by providing information about the MT to the translator (e.g., where does the MT come from if multiple sources are used, or to distinguish it from TM matches), as well as by making dynamic changes to the MT output on the basis of translator edits 

One of the most crucial factors for a translator to be able to post-edit effortlessly is the ability to make quick decisions as to the usefulness of a given MT output. Experiments and user feedback indicate that more cognitive effort is required of translators when deciding what to do with bad or mediocre MT output. In fact, when such MT output is hidden from post-editors’ view altogether, translator productivity increases across the board 

Aside from ensuring that the translator gets the highest possible quality of MT to work with, making the MT quality predictable makes it more useful, as decisions on whether to post-edit or not can be made much faster. Given that different translators benefit differently from MT, it further makes sense to establish MT quality thresholds configurable at the user level, so translators can decide for themselves how much of the MT output they would like to see. Quality estimation is a very hot topic right now, with many researchers putting effort into developing such systems, as the high number of participants in the Workshop on Statistical Machine Translation shared task last year shows. Some implementations of quality estimation are already on the market and we expect to see more such solutions making their appearance. 

The latest trend in MT research that aims to further increase MT quality has to do with the use of metadata to supercharge and inform the MT output. Such metadata are particularly useful in audiovisual localization for speaker identification and register. By feeding the machine with some of the audiovisual context, it is possible to reduce relevant errors, such as gender, and provide truly customized output for a given project or at the segment level. Even a phrase as simple as “I am happy” will be translated differently in some languages depending on the gender of the speaker; providing such context will help reduce grammatical agreement errors that seem to annoy translators far beyond the level of complexity required to correct them. 

Experiments have shown that by triggering style metadata that specifies the formal or informal register of a text, the MT system will output text that is not simply different at the level of grammatical register (e.g., the use of the correct form of the second person pronoun ‘you’ as a formality and politeness marker) but interestingly also in terms of vocabulary choices. In fact, vocabulary and fluency choices in connection with idioms were among the second most important type of errors that post-editors made negative comments about in a recent extended study conducted by Maarit Koponen and others on post-editing in the subtitling domain. Such a toggle between formal or informal style in the MT output could be triggered both at the project level and at the individual segment level. 

Another, less obvious example of the use of metadata in MT is the ability to produce shorter translations, to better suit the space and time constraints of subtitling. This is an improvement that has been suggested by post-editors in two different experiments, involving a statistical machine translation (SMT)  and an NMT  system. In both cases, translators had to deal with MT outputs that may have been correct translations of the source but which nevertheless needed to be edited down for readability reasons. Controlling the length of the output is now a possibility in systems that take the source vs. target text length ratio as additional input in the training, and can thus control the length of the output accordingly. 

When it comes to the creative industries, style is at the core of the translation process, so as to produce a text in which the style and tone of the translation is appropriate and culturally adjusted as needed for the target locale. By creating highly performing customized MT systems and putting translators in the driver’s seat with the help of metadata, they can drive the MT engine as they please to get the most benefit from its abilities. For example, one could choose to set the style of the MT output to “informal” when translating an action film, yet be able to switch to “formal” for the courtroom scene in the film; or have access to alternate translation options with different attributes, such as length in the case of video material with high speech rates or for caption file formats with a low character per line limit.  

Going forward, MT post-editing is destined to play an increasingly strategic role in localization planning and become a standard feature of localization production for businesses of all sizes. One after the other, language service providers in the audiovisual industry are experimenting with the use of MT in their workflows and are at various stages of implementation.  The introduction of spellcheckers in subtitle editors has already resulted in significant time savings for translators, and now MT is the next powerful tool to add to the translators’ belt that can provide them with even greater work efficiencies. Careful integration of MT in the editor interface is thus not only important for a successful implementation of any MT system, it can be a determining factor for its longevity in production. The key to success is taking a user-first applied AI approach and making the translator a partner in the integration process.  

Yota Georgakopoulou
With a PhD in translation and subtitling, a stint in academia, participation in multiple research projects, and over 20 years of experience in the industry in senior management roles at multinational language service providers, Yota is a leading audiovisual localization expert, specializing in the application of language technologies in media localization. She offers her services to high-profile organizations around the world as an independent consultant, advising on strategy, quality, tools, workflows, and language resource and data management.

Related Articles

Weekly Newsletter

Subscribe to stay updated