The calculus of global content

By Donald A. DePalma October 20, 2016

Translation buyers and suppliers face the challenges of massive content volumes, along with demands for faster turnaround times and more target languages, all while dealing with flat budgets. Some look to machine translation (MT) as the solution. At the same time, mainstream media like The Wall Street Journal report the criticisms of linguists and other specialists who decry the quality of MT. There is some math involved with handling big-data volumes, which establishes a clear role for MT in meeting these challenges.

Already huge, the digital universe of content, code and structured data grows by a mind-blowing amount every 24 hours. According to IBM, each day the world creates another 2.5 quintillion bytes of data. That daily infusion will pump the global repository of information from 7.9 zettabytes (7.9 x 1021 bytes) in 2015 to 176 zettabytes by 2025.

To be useful, much of this content requires transformation for different channels (such as web, mobile and print), conversions for various applications and localization for multiple markets. Let’s consider just the requirements to translate data into languages other than English to make information available to a broader audience. CSA Research’s annual report about the multilingual capabilities of thousands of websites shows that it takes 14 languages to reach 90% of the world’s most economically active populations. However, most websites max out with support for just six languages or locales. Product and document localization at many companies lag even more.

Why don’t companies translate more? Blame it on that quadruple challenge of volume, time, targets and budget. Whatever the reason, this absence of translation leaves many people outside looking in — and if they can’t read, our findings on the language preferences and behaviors of both consumers and business buyers show that they won’t buy.

However, the impact extends far beyond how much profit can be extracted from today’s active online consumers. Look beyond commercial return on investment to basic physical and safety needs of the rest of the world’s population. Telecommunication manufacturers predict there will be more than 6.1 billion smartphone subscriptions by 2020. This growing cohort of mobile users will raise the requirement for the digital universe to support many more languages. Speech, handwriting, wearables, sensors and the rest of the Internet of Things add even more content, input methods and delivery modes. Most of this data may not be translated, but it must still be delivered in human language.

So what gets translated in this content-rich, multilingual, ever-more-online-devices world? Not enough. Let’s ignore the massive amount of digital content that already exists. Instead, let’s focus on daily digital content creation. How much of a given day’s output might actually be translated if the entire language industry was working on just that content and none of the backlog of existing data?

First, calculate the expenditure for outsourced services in 2015. Translation in various forms — human, post-edited, transcreation, plus website globalization and text-centric localization — accounts for $26.4 billion of last year’s $38.1 billion language market. Next, calculate the daily spend by word. To do that, divide the size of the annual market by 365 days to represent the translation sector at $72 million per day (see graphic). At a hypothetical rate of 20 cents per word, the result is that professional translators process nearly 362 million words every day. Then, convert that amount to bytes at the rate of 9.7 characters per word, which equates to seven billion bytes of double-byte characters. Some languages have fewer characters per word on average, others have more, so this is just an average starting point for the calculation that you can adjust up or down.

Finally, compare that daily spend by word with the daily volume of content creation. When you divide the 2.5 quintillion bytes by the amount of language service provider-produced target-language content, you find that translation firms could potentially process just 0.00000000009% of the content created every day. However, it is safe to assume that much of that data will never be translated — some isn’t translatable and some doesn’t make sense to translate. Even if all but an infinitesimal percentage of those daily bytes is thrown out, the amount of content not outsourced for translation is far less than 1%. And as CSA Research’s data shows, if it’s translated at all, it’s typically into just a few languages. This translation estimate is source-centric, doesn’t reflect the many languages in which this daily output of content should be accessible and doesn’t at all address the zettabytes of existing content.

Of course, there are other variables that could be built into the calculation. For example, there’s always the option of in-house translation and the business decision to skip translation for markets for which you should translate but don’t. For example, some translation activity is conducted by in-house translation teams, bilingual employees pressed into service or distributors in foreign markets. But such work is a drop in the ocean and doesn’t affect the overall amounts.

What does this hypothetical exercise tell us? The traditional model for global business is to translate everything, just in case someone needs it. That’s a no-win situation given today’s volumes. The successful approach will be two-pronged: 1) localize in advance the absolutely essential content for key languages, and 2) make any content for any language available on demand. This just-in-time model means accepting lower quality output in some cases and embracing MT technologies, but at the same time making information accessible when people need it. In the final analysis, MT will help both buyers and suppliers translate more information for more markets within real-world timeframes and budgets.