What’s best?

Is there a ‘best’ automatic translation system? Sure. But you can only know by constantly testing them all across all languages against a vast range of specific tasks. In other words, never. “Best” looks like a Platonic idea, not a statement of fact.

Lots of blogs and sites these days are quoting the Language Weaver tag line – “the best translation systems in the world”. Does this help or hinder perceptions of translation automation, given the highly relative quality of any translation? Or should L W have taken a leaf out of Heineken’s book and added a modifier. The beer advertisement we sometimes see (at least in Europe) has “probably the best beer in the world”. Again, meaningless, but carefully nuanced to engender that vital suspension of disbelief we need when reading ads.

As it happens, Language Weaver has a best rival – an outfit called Delta, which appears to market a Brazilian/Spanish-English system for 125 bucks a shot, a far cry from what L W’s Arabic to English system would cost. Yet when you check to see how many others are trying to best the market, the results are disappointing. According to this Google search, “best machine translation systems” – note the plural – scored 65 hits, and “best translation system” managed 145, most of them from Delta. With a hundred or so citations out of a billion web pages, it clearly doesn’t matter to most people whether any systems are “best” or not.

That plural in Language Weaver is, however, an astute move. Unlike Delta (and many others) the company claims to develop language-specific systems (Arabic, Chinese, Hindi, French, Spanish to English), based on the analysis of statistical patterns of linguistic type phenomena in large bitexts. In other words, it has some sort of “technology” (recently reported on here probably causing the bloggorhea attack) that it deploys in “systems”.

Most of us, however, have tended to think in terms of the X translation system – that is, the underlying engine in X, that somehow has linguistic knowledge hand-coded into it (a dictionary of say 100,000 terms and 3,000 grammar and morphology rules) to drive whatever language pairs are on offer. Perhaps by going superlative about its nonce systems, rather than its core “machinery”, L W is shifting the focus from technology to task diversification. And in the L W case, translation tasks appear to be addressed by leveraging the deciphered output from previous similar tasks.

Let’s hope that a provocative “best” attracts competitors to this space. Plenty of room for more “systems”, even if there are only one or two effective core technologies.


Andrew Joscelyne
European, a language technology industry watcher since Electric Word was first published, sometime journalist, consultant, market analyst and animateur of projects. Interested in technologies for augmenting human intellectual endeavour, multilingual méssage, the history of language machines, the future of translation, and the life of the digital mindset.

Weekly Digest

Subscribe to stay updated

MultiLingual Media LLC