Word on the Street

Round robin evaluations for long tail languages

Adam Wooten

Adam Wooten, localization consultant and trainer at, has more than 15 years of experience in the language industry. He is assistant professor in the translation and localization management program at the Middlebury Institute of International Studies, and the former CEO and cofounder of AccuLing.

“Oh, wow! Really?! Your company works in that many languages?! That’s amazing!”

We’ve all heard it before. People previously unfamiliar with the language services industry will gasp in awe when they hear an agency works in more than 100, 150 or 250 languages. But industry veterans know these numbers are the norm, not an impressive exception. Every language service provider (LSP) works in more than 100 languages — or at least they can — because it is easy to find the contact information for talent in hundreds of languages.

What is more important than translating a specific language is knowing one can trust the translation in that language. That brings up an issue that is much more important than language count: how can a company trust the translations provided in so many languages?

Many methods exist for accrediting, certifying or otherwise verifying the quality of translation talent, but many of those methods soon reach their limits before an LSP has reached the 100-language mark.

The challenge of 7,097 languages

Ideally, a single person or entity could somehow assess and certify the quality of translators and translation providers in any language. However, with 7,097 languages spoken in the world today, as counted by Ethnologue. com, it is absurd to think that any single person could possibly master them all. And no certifying body has come close to offering widely accepted accreditation of translators, interpreters or other language talent in even 100 languages.

So how can LSPs trust each translator in so many languages?

Partial solution: Certifications

Some LSPs may avoid getting into the details by claiming they only work with talent certified by a recognized organization like the ATA (American Translators Association) or the NAATI (Australia’s National Accreditation Authority for Translators and Interpreters). However, the ATA only certifies translators in 18 languages — and 32 language pairs — while the NAATI certifies translators and interpreters in 60 languages other than English. Few other organizations offer certification in so many languages.

Certifications and degrees can help provide some reassurance that a linguist can be trusted. But how can an LSP trust a linguist who works in a language or subject for which certification is not readily available?

Partial solution: Experience and reputation

When asked about these exceptions, another LSP might point to reputation and experience. According to EN 15038, the European Standard for Translation Services, if a translator does not gain professional competencies through “formal higher education in translation (recognized degree)” then the translator can gain those competencies through either “equivalent qualification in any other subject plus a minimum of two years of documented experience in translating” or “at least five years of documented professional experience in translating.”

Such documented experience and a reputation verified by others can also provide great reassurance about a linguist’s skill level. Sometimes such experience and reputation are sufficient, and other times they are not even an option.

Long tail languages

Once a company knows to trust at least one translator in a specific language pair and subject specialization, that trusted translator can evaluate additional candidates. However, eventually a company will encounter requests for translation in languages or subject areas for which certifications or reputations are not enough to prescreen language talent.

Is it enough to trust any alleged translator whose contact information is available online and simply hope for the best? Trusting without verifying has led to memorable mishaps like the sign language interpreter who warned of “pizza” and a “bear monster” during a 2017 emergency press briefing instead of warning about Hurricane Irma.

Some companies can afford to risk completing projects with such uncertainty, but others need more reassurance that the translators will complete the job as expected before obtaining feedback from end users. What can be done?

Round robin evaluations

Over the last decade, companies have been identifying creative semiautomated methods to evaluate translation quality when uncertainty would otherwise be high. Many translation crowdsources like Facebook have implemented voting systems to increase the probability that they would identify the most acceptable translations from users who are nonprofessional translators. Even Duolingo has compared translated sentences from many beginning language learners to identify the ones with the highest probability of being acceptable.

Facebook, Duolingo and others have been doing this primarily with the most common language pairs, for which they can obtain large numbers of votes or translation comparisons. Regardless, a similar comparison model can be used to reduce uncertainty when evaluating new potential language talent in long tail languages.

Round robin translation evaluations can be an effective method to identify the best translators from a small group of candidates. You can complete a round robin translation evaluation in three basic steps.

1. Each talent candidate submits a sample translation test

After a talent manager identifies a small group of candidates — perhaps four to six candidates — based on relevant criteria and available information, you can administer the test. Ask each candidate to translate a sample.

Creating an effective sample test is a topic that deserves an article of its own, but the following tips should be followed at a minimum. To make the test as relevant and cost-effective as possible, the sample will ideally be a portion of text that already requires translation, not a throwaway text on an unrelated topic. The sample should not be too long, but it should be long enough to assess a translator’s ability. Finally, when possible, make the test more realistic by confirming that you are sending the test at a time the candidate is available so you can impose a reasonable time limit.

2. Each talent candidate reviews and rates the other translations

After collecting the translated samples, send a copy of each translation to each candidate and request a rating for each translation. Tips for rating and evaluating translations also deserve a separate article, but it is good to keep the rating system simple and consistent across all the reviewers, so you can easily compare the results.

Personally, I like to give the reviewers a scale to express how much they like reviewing that translators work; in other words, the candidate can imagine they had to review another candidate’s work at a fixed word rate and confirm which translators they would want to review more and which translators they would want to review less.

3. Rating patterns help the requestor to identify the best relative candidates

After collecting the ratings from each reviewer and grouping the ratings according to each translation candidate, patterns will begin to emerge. In most cases, it will be clear which one or two candidates are esteemed to be the best among their peers. These results from a small round robin may not provide as much confidence as one would gain from seeing hundreds of evaluations or an evaluation performed by a single trusted reviewer. However, these results from a small round robin do increase the odds that you will start with the best available translators.

With these results, the newly most trusted translator can help you decide if you should add other candidates to the team who received mixed ratings during the round robin. And ongoing evaluations and feedback from clients and end users will help you to refine your team of translators in this new subject or language pair.

Disclaimer: Round robin evaluations are not perfect

Hopefully, many readers are already feeling cautious about how this might go wrong. It is not a perfect solution, so we must be aware of the possible risks.

First, it is possible that all candidates in a single round robin are unacceptable and that might not show in the ratings. This risk is reduced by prescreening candidates as thoroughly as possible and increasing the size of a round robin, but this method is not foolproof.

Second, it is possible for candidates to sabotage a round robin, like when a group of volunteer Turkish translators banded together in 2010 to game the voting system of a crowdsourced translation project, and the result was a Facebook interface filled with expletives. Such sabotage is less likely to come from self-proclaimed professional translators who have been individually vetted as much as possible, but it is admittedly still possible.

Third, we must acknowledge that the best translators are not always the best reviewers or evaluators. Translation, review and evaluation may be related skills, but they are different. It is possible to identify excellent translators who are horrible reviewers and evaluators, and vice versa. Thus, translators are not always perfectly suited to review and rate other translators as they do in a round robin evaluation.

Regardless of its limitations, a round robin translation evaluation is one more option to add to what should be a thorough list of methods for screening and evaluating translators. The method helps to reduce uncertainty in language pairs and specialized subjects where certifications and reputations are limited.