In the words of Brad Ross from Lingotek, one of the most difficult aspects of buying translation services is evaluating if you’re getting the quality you’re paying for.
For my company, it was also a question of whether our performance meets clients’ expectations, and to what extent. As most language service providers (LSPs) do, we keep track of quality scores received from our customers. As part of the process we couldn’t help noticing that scoring systems differ from client to client. Whether it is LISA, DQF or any other model, we need to make sure that our linguists are suited to them all.
So what did we do to align various systems?
We developed our own scoring method to track performance of the in-house and freelance linguists.
In 2015, we established a dedicated quality assurance (QA) department that consisted of two independent QA specialists, and with their help, an automated translator rating system was deployed. Using this system, QA specialists revise and evaluate the performance of regular linguists, allowing us to tally a weighted score for the areas and genres that the translator or editor claims to be proficient in. At first, this approach helped to:
Effectively draft a team for a particular project.
Prevent the involvement of incorrect resources (with lower scores for a particular combination of area and genre).
Track the performance of new team members.
But even taking into account the fact that we are not evaluating each and every job (the selection of projects for evaluation depends on the status of the project in the database), the seeds of doubt soon began to take root.
First of all, it became unclear how many ratings (or scores) are enough to claim that the system works. For instance, does a given distribution of scores show a realistic picture? How many ratings within what period of time are enough to clearly indicate a linguist’s certain level of competence?
We examined the 132 most recent ratings in the database and discovered that they more or less fall into a normal probability distribution (Figure 1).
Having a continuous random variable distributed approximately in accordance with the normal law cheered us up a bit, because we realized that we were moving in the right direction.
Statistics, however, do not tell us exactly how many estimates are sufficient. One can only say whether statistics obtained will be preserved if the data become larger, or if we are already satisfied with the numbers we have, because the probability with which we receive a score is, for example, 95%. Indeed, with our set of data, with a probability of 95% an average value lies in the interval of (3.8, 4.1), so it is fairly stable.
But delving deeper into the statistics, we realized that a sufficient number of areas and genres are still unknown. Check a random client translation management system, or a competent workforce search resource such as proz.com, and you’ll see all these ambitious lists of subject matters that linguists claim to be proficient in, from cochlear implantation to hentai games. What is the average here? How can we identify several solid genres and areas that would accurately describe a translation project?
For starters, why not cut it down to one subject matter — why introduce two different entities, such as genre and area?
This is actually dictated by the requirements of the modern localization industry. It is no longer enough to be just a good technical translator; clients are interested in your skills in a particular field, such as “IT + marketing,” “motorcycles + creative” and the like. Therefore, we introduce two subsections, genre and area, that more or less clearly describe the nature of the content to be localized.
When we began we had 13 genres and 13 areas, with more entries being introduced quarterly. The QA expert considered that a translator’s rating in such a complicated environment was not informative, and suggested reducing the number of genres to three: technical, legal and marketing. After due consideration (and with a little help from mathematical statistics), it was decided to have six genres and ten subject areas, and to transpose all existing ratings to these 16 values. It significantly simplified the process and decreased the number of ratings to be calculated. But for larger LSPs with broader horizons, there may be more values needed.
Thus, after all these measurements we can say that quality management strategy must include not only quality evaluations themselves, but also the constant analysis of their quantity, intensity and cost.
And ideally, scores should result in the adjustment of rates and the allocation of work to linguists. With key performers regularly rated, you can assure your customers that the work they assign to you is in competent hands. Because in the long run, what is more important: measuring quality or delivering it?