Multi-engine MT management for global companies

Marco Zappatore headshot
Filip Šanca headshot

Marco Zappatore

Marco Zappatore, telecommunication engineer, works as adjunct professor of CAT Laboratory at the University of Salento (Italy). His research activities focus on data management, wireless propagation, computer-assisted and machine translation, as well as video game localization.

Filip Šanca

Filip Šanca is a graduate of Charles University in Prague. Having successfully established the academic partner network at Memsource, he is now coordinating the Memsource Certification Program and leading the marketing department.

Marco Zappatore headshot

Marco Zappatore

Marco Zappatore, telecommunication engineer, works as adjunct professor of CAT Laboratory at the University of Salento (Italy). His research activities focus on data management, wireless propagation, computer-assisted and machine translation, as well as video game localization.

Filip Šanca headshot

Filip Šanca

Filip Šanca is a graduate of Charles University in Prague. Having successfully established the academic partner network at Memsource, he is now coordinating the Memsource Certification Program and leading the marketing department.


achine translation (MT) is a constantly evolving field, where research contributions from natural language processing, computational linguistics, artificial intelligence and computer science merge together with the aim of improving the quality and reliability of automatically translated texts. Several different MT approaches have showcased their advantages and limitations so far, thus motivating the adoption of multi-engine MT (MEMT), where multiple MT systems are queried at the same time in order to provide the best translation alternative.

However, significant costs, scarce automation and the risk of nonoptimized configurations make this promising research area less of a viable solution for global companies. There is a need for MEMT management systems capable of autonomously suggesting the optimal MT engine for the given language pair and domain to a translation company.

The research in the MT sector is experiencing continuous growth and diversification. During the last decades, several MT approaches have been examined, starting from the widely-known rule-based MT (RBMT). RBMT exploited a sequence of analysis, transfer and synthesis components to obtain a target language representation of the original text from a set of grammar rules. However, RBMT required adapting its components to each language, thus achieving limited effectiveness. Statistical MT (SMT) then gained relevance. SMT systems embed a translation model that is trained statistically on parallel bilingual text corpora, meaning it infers statistical translation models from human translations already available. It follows a language model that provides the knowledge of correct target language phraseological structure. Although more efficient than RBMT, the SMT approach has some limitations, as it requires a high similarity between the texts to be translated and corpus texts, as well as huge training corpora that must be correctly aligned and properly balanced. This makes SMT less suitable for less-resourced languages.

Further improvement has been introduced with neural MT (NMT), where multiple (hidden) layers of interconnected processing nodes (a neural network) are fed and activated with input language patterns (training material) so that it is possible to predict the likelihood of the target sentence structure. NMT requires considerably less memory than SMT, but its effectiveness is directly proportional to the availability of training material. It thus demands additional efforts such as a pivot language when under-resourced language pairs have to be dealt with. The inherent complexity and lexical/syntactic ambiguity of natural language, however, pose several challenges to MT approaches, such as homonymy, polysemy, homography, sentence structure, anaphora, idiomatic expressions and so on.

In order to tackle these issues, hybrid MT (HMT) techniques have been investigated aiming to improve translation quality and accuracy. MEMT represents a promising HMT technique where several MT systems provide their own translation for a given source text (typically via online APIs) and then their output is properly combined. Several hybridization methodologies have been explored so far. These can be applied once each MT system has produced its translation ranked selection. They might use BLEU scores; a one-to-one combination of translation fragments using confusion networks; or many-to-many combinations of translation fragments using lattice-based networks.

Combining the output of different MT systems at the sentence, segment or fragment level brings a significant qualitative improvement, but cannot be considered as the long-awaited silver bullet for translation companies. Indeed, MEMT-based approaches that exploit multiple MT APIs require a company to pay for more than one MT usage plan at a time but, actually, they use just a small percentage of the overall text amount the company paid for. When we count in the ROI aspect, then, the MEMT approach mostly makes sense on the job or file level if you can, on one hand, evaluate more engines at once but, at the same time, pay just for the one translation you use in the end. There are existing tools that already enable companies to implement this approach, while automating the MT selection process on the basis of MT quality scores and thus minimizing the manual effort.

The global MT market

Technology advancements in AI and deep learning are improving MT quality significantly by reducing post-editing efforts. Higher productivity rates and lower costs, if compared to human translation, are making MT an appealing choice for a continuously growing number of language combinations. Suitable content domains are extremely diversified and span from healthcare to marketing, from IT to automotive, from defense to education.

Such a favorable technological landscape finds one of its key drivers in business globalization. The more companies expand their customer base and their market segment abroad, the higher the request for fast and reliable content localization becomes. The global MT market is expected to expand at a compound annual growth rate of 19% and with a year-over-year growth rate of 15.3% between 2020 and 2024. The estimated incremental growth is from $400 million, the MT market size in 2016, to $1.5 billion, the MT market value projection for 2024. This is especially true in sectors like video content translation.

Current forecasts, however, still depict a very fragmented market for the period 2020-2024. Technavio Research predicts in their report “Machine Translation Market by Application and Geography – Forecast and Analysis 2020-2024″ that big players (such as Microsoft, Google, Amazon, IBM, SDL, SYSTRAN, DeepL and so on) will widen their MT service offerings and cope with MT quality improvement, but will keep coexisting with smaller vertical realities and open-source MT services.

Selecting the MT solution capable of providing the best translation quality depending on the required language combination and the targeted domain is expected to become an essential requirement for global companies. Recent dedicated reports have assessed stock MT services on a variety of configurations in order to highlight where a certain MT engine performs the best, but a comparison of their usage patterns by global companies has yet to be addressed.

Picking the right MT engine: A real test case

In order to provide useful insights on how companies actually exploit MT services, and how their usage patterns can be improved, let’s compare MT usage data in three selected NMT engines used at Memsource. Real engine names are not shown and only a fictional label is assigned to each of them (such as NMT Engine A) as the main purpose is to show how such engines are exploited and how their usage can be improved.

NMT engine performance by usage percentage and by language combination.

Figure 1: NMT engine performance by usage percentage and by language combination.

The scatterplot depicted in Figure 1 shows NMT engine performances by language combination and by usage. Five analysis variables are considered in Figure 1. On the horizontal axis, the percentage of usage is reported (that is, the ratio between the number of MT-translated segments and the overall number of segments in a given language combination) while on the vertical axis the average MTPE score is shown. Data points are:

 shaped in such a way that each NMT engine is represented with its initial (so A stands for NMT Engine A);

 colorized according to the examined language combinations (so EN>CS is in green);

 sized depending on the overall number of segments for each language combination (so a bigger initial corresponds to a higher number of available segments).

For instance, the small orange A in the upper left reveals that the first NMT engine has an average 0.75 MTPE score and a 2% usage ratio over a total of less than 1 million segments in the EN>NL language pair. Similarly, the big blue C in the middle of the chart details that the third NMT engine has an average 0.65 MTPE score and a 45% usage ratio over a total of more than 3.5 million segments for the EN>FR language pair.

Several insights came to light thanks to this scatterplot. First, there is clearly an engine (engine C) that is used the most for each language combination, as nearly all its data points are placed to the rightmost part of the chart. However, if the corresponding MTPE average scores are considered, the most used NMT engine never coincides with the best performing one, for any language pair.

In addition, the best-performing NMT engines are the ones that are used the least. For instance, if we consider the EN>DE language combination, engine C is the most-used one (63%) but it has an MTPE average score (0.58) significantly lower than the score of engine A (0.72), which has been used only on 15% of the 2 million segments for that language pair, instead. This difference is even more relevant for the EN>NL language combination, where the worst performing NMT engine (engine C, average MTPE=0.64) is used quite often (90% usage ratio), while the best performing NMT engine (engine A, average MTPE=0.75) is almost unused (less than 3% usage ratio). It is worth pointing out that engine A is not available for the EN>CS language pair.

NMT engine performance by domain and by language combination.

Figure 2: NMT engine performance by domain and by language combination.

Further details are provided by the scatterplot in Figure 2, where NMT engine performances (evaluated as MTPE scores, on the vertical axis) are broken down in terms of source domain, per language combination. Source domains (horizontal axis) are reported with numerical IDs to make the chart more readable and their actual meaning is enlisted in the informative text area at the bottom right corner of the same chart (domain 0 stands for medical, domain 1 stands for travel and hospitality, and so on). Average MTPE scores per language combinations are shown as red reference lines. Each NMT engine is associated with a different color, so that several interesting elements can be spotted immediately.

First, engine A is the best performing NMT engine (in terms of MTPE) in the majority of source domains for the available language combinations (all except EN>CS). Second, the NMT engine C provides the lowest MTPE scores for the majority of source domains in all language combinations, since its data points are quite often below the average. Moreover, we can see that the best MTPE score has been achieved by NMT engine B (travel and hospitality, EN>NL) and the worst one by NMT Engine C (gaming, EN>CS). The NMT engine B scores on average for the majority of language pairs and domains.


Boosted by business globalization and supported by technological advancements, the MT market is gaining strength and importance, with significant growth rates expected in the next five years. Companies willing to improve productivity rates and to enlarge targeted customer segments cannot overlook MT solutions, as they promise a viable and affordable solution.

However, several challenges typically hamper the selection of the best MT engine: different MT approaches are available; multiple providers showcase heterogeneous service offerings; and MT quality differs significantly depending on applied MT technique, language combination and content domain. This often determines ineffective MT usage patterns by companies. These patterns were identified by analyzing MT usage data on three different NMT engines in four different language combinations and 11 domains. The MEMT approach can cope with these challenges, as it allows querying more than one MT engine at the same time, but it is not a cost-effective solution. From a business point of view, then, we can see that proper MT management systems are helpful in order to automatically identify the most suitable MT engine for a certain translation service request.