Measuring the benefits of using SMT

At present there is no precise indication of the benefits of using statistical machine translation (SMT) for potential users. The question “is this going to save me time and money, and if so how much?” is not addressed in any systematic way. The common answer provided by most SMT service providers is “well, it depends.” This is far from the answer that users need in order to make an informed decision about whether to go ahead with SMT.

What is lacking in the industry today is a description of the main factors affecting the quality of SMT output and how you can use them to provide an indication of the savings that SMT will provide. In the end, the decision on whether to use SMT depends on the amount of time saved during translation. This paper provides a clear indication of the savings you can expect, depending on the key factors that affect the quality of the SMT, based on a simple calculation that provides a percentage reduction in translator effort (PRTE) that can be expected for a given localization project.

Translation cost

Translation forms part of the cost of localization, and it is often all too easy to forget about the other elements of the overall localization process and subsequent costs. In fact, translation itself typically accounts for only between 30% to 50% of the overall cost of a localization project, depending on how much automation is involved in the overall localization workflow. Figure 1 shows the standard cost model for a manual localization process.

The other costs of localization, apart from the profit made by the localization service provider, are the management and administrative costs, as well as proofreading, review and correction. An automated translation management system (TMS) can significantly reduce the administrative and management costs of the localization process.

PRTE calculation

Having put the cost of translation into perspective, we can now look at the factors that affect the quality of SMT and consequently the PRTE.

PRTE can be defined as the percentage reduction in translator effort by using SMT compared to human translation on its own. PRTE is the key factor that decides how much savings you can expect to gain from SMT for a given project. The quality of SMT is governed by three major factors:

The language closeness: the similarity of the source and target languages in terms of morphology, word order and grammar.

The amount of training data.

The relevance of the training data to the current text being translated.

If we provide mathematical weightings to these factors, we can use them very effectively to provide a calculation of the percentage translator productivity we can expect to achieve using SMT. In order to provide a percentage, we will use a probability type estimation for each factor with a range of 0 to 1, where the value 1 assumes an idealized perfect situation and 0 the opposite.

Let us now consider these factors in detail.

Language closeness

SMT output is affected by the differences between the source and target languages in terms of various aspects, including grammar, word order and morphologies. To put it simply, the closer the two languages are in terms of grammar and word order and morphology, the better the outcome. To take the extreme case of US English to UK English, we can state that the language closeness is 1.0, as the two variations of English only differ in some spelling instances. Using English as the source again and this time French as the target, we can assume a closeness value of 0.8, as both languages have similar primitive morphologies and word order. For English to German, we would use a value of 0.6 as the differences in morphology and word order are much more pronounced. For English to Russian or Polish, the proposed value would be 0.45, and for English to Japanese it would be 0.25, as there are significant differences in word order and morphology between the two languages.

A good indication of the difference in language models can be found at — this site provides a comparison for some major languages concerning the difficulties that native speakers of those languages have in learning English. The degree to which these students have issues with learning English is also indicative of the basic differences in grammar and morphology between their native tongue and English, and also indicative of the difficulties posed in terms of SMT between English and those languages.

Figure 2 provides an indication of the types of factors where English is the source language. The factors have been arrived at from personal experience and should require further investigation, but they are a good starting point.

If all other factors affecting SMT quality are in an ideal state, then the expected productivity improvement, where the language closeness is the only factor, then Figure 3 shows the expected productivity improvement where English is the source language, depending on the target language.

Training set size factor (TSSF)

The next key factor regarding SMT quality is the size of the training set. Too small a training set, and there will not be enough data to provide an adequate model for translation. When there is no training data, the training set should be 0. As the size of data increases, the training set size approaches 1, so when the training set size is 1, there is infinite training data. We use the equation below to estimate TSSF:

      TSSF=1-2 –Size


Where Size is the actual training data size and Size’ is the empirical data size which makes TSSF equal to 0.5 (saving half of the efforts).

What this means is that a training set size of Size’ would result in a reduction of the translation effort of 50%. In practical terms this would normally equate to around 50,000 segments, depending on the material being translated. A training set size of 10,000 segments would produce a TSSF of .067 whereas 100,000 segments would result in a TSSF of .75 and 200,000 segments would produce a TSSF of .9375.

Using the above assumptions, as a very rough rule of thumb, you can assume that a training set size of 250,000+ segments would provide a TSSF value of approaching 1. Anything less would result in reducing the TSSF value roughly by 0.1 for every reduction of 25,000 segments in the training set size. 

A constant problem with SMT is the issue of out-of-vocabulary words: these are words that have not been encountered previously in the training set. If the training set size is too small then you can expect a commensurate increase in out-of-vocabulary word instances, and therefore more work for the translator.

For the purposes of the PRTE calculation we can assume again a value of between 1 (ideal training set size) and 0 (no training set). Zero would be an improbable value (we would not be able to build an SMT engine with no training data), but we can see that if not enough training data is available it would have significant impact on the quality of the SMT.

Domain similarity (DMS)

Empirical evidence has shown that the quality of SMT also depends on the quality of the training set. A smaller training set on the same topic domain produces much better results than using a generalized training set. Specific domains have their own vocabulary and phraseology that cannot be rendered with a general SMT engine.

For the purposes of the PRTE calculation we can assume a value between 1 (exactly the same specific domain from data for exactly the same organization) and 0, a completely unrelated specific domain. A generic SMT engine would rate 0.25 where the subject matter being translated related to a highly specific domain with its own detailed terminology.

PRTE formula

The PRTE formula itself takes all three of the above factors to provide an overall calculation that is easy to implement:

PRTE = (LC x TSSF x DMS) x 100%.

This can be represented by the three-dimensional graph in Figure 4.

To test the validity of the formula we can try some examples:

Translating from US English to UK English we can assume a language closeness value of 1. If we have an ideal reference TSSF of 1 and an ideal DMS of 1, we arrive at a PRTE of 100% (because 1 x 1 x 1 x 100 = 100%). This would mean that the SMT output should require no translator intervention, providing a productivity figure of 100%.

Translating from English to French, we can assume a language closeness value of 0.8. If we have a slightly less than ideal TSSF of 0.75 but with an ideal DMS of 1, we arrive at a PRTE of 60% (because 0.8 x 0.75 x 1 x 100 = 60%). This would mean that we should expect an improvement regarding translator productivity of 60% compared with a completely manual human translation.

Translating from English to Japanese, we can assume a language closeness value of 0.2. If we have an ideal TSSF value of 1 and an ideal DMS of 1, we arrive at a PRTE value of 20% (because 0.2 x 1 x 1 x 100 = 20%). This would provide an estimated 20% improvement in translator productivity.

Estimating wisely

The PRTE formula is not designed to be a hard and fast assessment of the expected percentage reduction in translator effort, but rather an overall rough estimation of what can be expected. Some of the figures are expected to be at best a guess as regards the DMS and TSSF figures. The language closeness values are also a rough approximation, and some SMT systems with an appropriate amount of tuning will be able to provide better values. It also does not take into account the differences between individual SMT engines: some will inevitably be better than others. The amount of manual tuning also needs to be taken into account, as it requires the input of highly skilled engineers.

Nevertheless, the PRTE formula provides a guide to what is achievable for a given situation and a rough idea of the returns that can be expected. This is vastly better than nothing, or “well, it depends,” which is the current situation.