Creative destruction in the localization industry

By Ameesh Randeri May 16, 2018

The concept of creative destruction was derived from the works of Karl Marx by economist Joseph Schumpeter. Schumpeter elaborated on the concept in his 1942 book Capitalism, Socialism, and Democracy, where he described creative destruction as the “process of industrial mutation that incessantly revolutionizes the economic structure from within, incessantly destroying the old one, incessantly creating a new one.”

What began as a concept of economics started being used broadly across the spectrum to describe breakthrough innovation that requires invention and ingenuity — as well as breaking apart or destroying the previous order. To look for examples of creative destruction, just look around you. Artificial intelligence, machine learning and automation are creating massive efficiency gains and productivity increases, but they are also causing millions to lose jobs. Uber and other ride hailing apps worldwide are revolutionizing transport, but many traditional taxi companies are suffering. The industrial revolution was an example of a manifestation of creative destruction across industries in a short span of time. Millions lost their livelihoods, but industry transformed and in the process, millions of new jobs were created. Textile manufacturing output rose exponentially and a new era of transport came about.

The process of creative destruction and innovation is accelerating over time. To understand this, we can look at the Schumpeterian (Kondratieff) waves of technological innovation.

We are currently in the fifth wave of innovation ushered in by digital networks, the software industry and new media. The effects of the digital revolution can be felt across the spectrum. The localization industry is no exception and is undergoing fast-paced digital disruption. There is a confluence of technology in localization tools and processes that are ushering in major changes.

The localization industry: Drawing parallels from the Industrial Revolution

All of us are familiar with the Industrial Revolution. It commenced in the second half of the 18th century and went on until the mid-19th century. As a result of the Industrial Revolution, we witnessed a transition from hand production methods to machine-based methods and factories that facilitated mass production. It ushered in innovation and urbanization. It was creative destruction at its best. Looking back at the Industrial Revolution, we see that there were inflection points, following which there were massive surges and changes in the industry. For textiles, the inflection point was the invention of the spinning jenny. For transportation, it was the invention of the steam engine.

The fifth Schumpeterian wave of innovation has brought about massive advancements in digital networks, software, information and communication systems and so on. The localization industry has benefited from these advancements.

Prior to the Industrial Revolution, garments and textiles were generally homemade and the process was extremely manual and inefficient. Translation has historically been a human and manual task. A translator looks at the source text and translates it while keeping in mind grammar, style, terminology and several other factors. The translation throughput is limited by a human’s productivity, which severely limits the volume of translation and time required. In 1764, James Hargreaves invented the spinning jenny, a machine that enabled an individual to produce multiple spools of threads simultaneously. Inventor Samuel Compton innovated further and came up with the spinning mule, further improving the process. Next was the mechanization of cloth weaving through the power loom, invented by Edmund Cartwright. These innovators and their inventions completely transformed the textile industry.

For the localization industry, a similar innovation is machine translation (MT). Though research into MT had been going on for many years, it went mainstream post-2005. Rule-based and statistical MT engines were created, which resulted in drastic productivity increases. However, the quality was nowhere near what a human could produce and hence the MT engines became a supplemental technology, aiding humans and helping them increase productivity.

There was a 30%-60% productivity gain based on the language and engine that was used. There was fear that translators’ roles would diminish. But rather than diminish, their role evolved into post-editing.

The real breakthrough came in 2016 when Google and Microsoft went public with their neural machine translation (NMT) engines. The quality produced by NMT is not yet flawless, but it seems to be very close to human translation. It can also reproduce some of the finer nuances of writing style and creativity that were lacking in the rule-based and statistical machine translation engines. NMT is a big step forward in reducing the human footprint in the translation process. It is without a doubt an inflection point and while not perfect yet, it has the same disruptive potential as the spinning jenny and the power loom. Sharp productivity increases, lower prices and since a machine is behind it, the volumes that can be managed are endless. And hence it renews concerns about whether translators will be needed. It is to the translation industry what the spinning jenny was to textiles, where several manual workers were replaced by machines. What history teaches us though is that although there is a loss of jobs based on the existing task or technology, there are newer ones created to support the newer task or technology.

Before the 1860s, steel was an expensive product because of the difficulty and complexity in making it. Similarly, the localization process has its fair share of complexities. There are multiple factors involved, such as source formats; hundreds of content creators within a company, a lot of manual touchpoints — file transfers, schedule and cost tracking, merging translated content back into source, to name just a few.

The complexities in the process reduce efficiency and productivity. This means that companies face severe restrictions in going global or expanding quickly into a broad set of languages since it would require a lot of resources. In the steel industry, two inventors charted a new course: Abraham Darby, who created a cheaper, easier method to produce cast iron, using a coke-fueled furnace and Henry Bessemer, who invented the Bessemer process, the first inexpensive process for mass-producing steel. The Bessemer process revolutionized steel manufacturing by decreasing its cost, from £40 per long ton to £6–7 per long ton. Besides the reduction in cost, there were major increases in speed and the need for labor decreased sharply.

The localization industry is seeing the creation of its own Bessemer process, called continuous localization. Simply explained, it is a fully-connected and automated process where the content creators and developers create source material that is passed for translation in continuous, small chunks. The translated content is continually merged back, facilitating continuous deployment and release. It is an extension of the agile approach and it can be demonstrated with the example of mobile applications where latest updates are continually pushed through to our phones in multiple languages. To facilitate continuous localization, vendor platforms or computer-assisted translation (CAT) tools need to be able to connect to client systems or clients need to provide CAT tool-like interfaces for vendors and their resources to use. The process would flow seamlessly from the developer or content creator creating content to the post-editor doing edits to the machine translated content. The Bessemer process in the steel industry paved the way for large-scale continuous and efficient steel production. Similarly, continuous localization has the potential to pave the way for large-scale continuous and efficient localization enabling companies to localize more, into more languages at lower prices.

There were many other disruptive technologies and processes that led to the Industrial Revolution. For the localization industry as well, there are several other tools and process improvements in play.

• Audiovisual localization and interpretation: This is a theme that began evolving in recent years. Players like Microsoft-Skype and Google have made improvements in the text-to- speech, speech-to-text arena. The text to speech has become more human-like though it isn’t there yet. Speech-to-text has improved significantly as well, with the output quality going up and errors reducing. Interpretation is the other area where we see automated solutions springing up. Google’s new headphones are one example of automated interpretation solutions.

• Automated terminology extraction: This is one that hasn’t garnered as much attention and focus. While there is consensus that terminology is an important aspect of localization quality, it always seems to be relegated to a lower tier from a technological advancement standpoint. There are several interesting commercial as well as open source solutions that have greatly improved terminology extraction and reduced the false positives. This area could potentially be served by artificial intelligence and machine learning solutions in the future.

• Automated quality assurance (QA) checks: QA checks can be categorized into two main areas – functional and linguistic. In terms of functional QA, automations have been around for several years and have vastly improved over time. There is already exploration on applying machine learning and artificial intelligence to functional automations to predict bugs, to create scripts that are self-healing and so on.

Linguistic QA on the other hand has seen some automation primarily in the areas of spelling and terminology checks. However, the automation is limited in what it can achieve and does not replace the need for human checks or audits. This is an area that could benefit hugely from artificial intelligence and machine learning.

•Local language support using chatbots: Chatbots are fast becoming the first level of customer support for most companies. Most chatbots are still in English. However, we are starting to see chatbots in local languages powered by machine translation engines in the background thus enabling local language support for international customers.

• Data (big or small): While data is not a tool, technology or process by itself, it is important to call it out. Data is central to a lot of the technologies and processes mentioned above. Without a good corpus, there is no machine translation. For automated terminology extraction and automated QA checks, the challenge is to have a big enough corpus of data making it possible to train the machine. In addition, metadata becomes critical. Today metadata is important to provide translators with additional contextual information, to ensure higher quality output. In future, metadata will provide the same information to machines – to a machine translation system, to an automated QA check and so on. This highlights the importance of data!

The evolution in localization is nothing but the forces of creative destruction. Each new process/technology is destructing an old way of operating and creating a new way forward. It also means that old jobs are being made redundant while new ones are being created.

A glimpse into the future

Let us meld these seemingly disparate but innovative technologies and processes together. What do we get? The overarching process is what you would call continuous localization. The starting point is the content creation process and this could be performed in multiple source control tools.

There would be a way to configure the process and specify parameters such as content type, languages, frequency of translation, turnaround time, machine translation engine to be used and so on. The content would be continually extracted into the translation pipeline based on new content created or changes made. An automated terminology extraction process would run, sending extracted terminology to a terminologist, who would pick term candidates and send these for translation. In parallel, the content would be machine translated using MT engines (neural, statistical or rule-based) and then sent for post-editing to vendor translators. Once finalized, the translated content would go through automated QA checks and a QA sheet would be published. The QA-ed content would be sent back to the source control tool for continuous deployment.

Independently, you would have the translated and approved terminology being pushed into the translated content. You would also have quality audits and feedback coming in from customers, end-users that you would integrate. Lastly, all the systems and processes would be interconnected and the data would be visible on dashboards. Systems would send automated alerts in case of issues or failures. Customer support would be provided in local language using chatbots powered by MT engines.

This is a rather simplistic way of looking at the entire process but, essentially, we are already staring at this reality. A seamless, automated, configurable workflow that is as close to push button as the current technology stack allows. It is a workflow and process that allows humans and technology to complement each other and work together to increase efficiency and productivity.

Taking this a step further, let us look at the human touches in the process that we just described. In the longer term as technology evolves, I think we would be able to reduce or eliminate those elements as well. Things like terminology translation or QA audits or defect fixing, I believe could be automated as machine learning and artificial intelligence evolve. The role of humans will be to configure processes and tools, manage and maintain them, monitor the process and manage relationships with internal and external stakeholders.

The evolution of localization departments and localization vendors

Johan Aurik, CEO and chairman of A.T. Kearney presented a paper at the World Economic Forum titled “The Case for Automating Leadership.” The title is cheeky but the point being made was not that leadership can be automated. Rather, there are tasks of leaders or managers that can be fully automated so that they can focus their time on things that humans are better at — relationship management, issues requiring judgement, creativity and so on.

When we look at all the technologies and process improvements in the localization industry, we can see that leadership is being automated as well. A project manager or program manager used to spend time setting up the localization process, deciding on the translation process, QA process, vendor to use, creating schedules, manually shuttling files from content creator to vendor to reviewer and back and forth several times, working on the release of the product or asset, following up on feedback from customers and so on. Localization engineers would set up projects in the CAT environment, manually or via a parser, extract localizable content and push it into the CAT tool. Since file formats were not localization friendly, a lot of time was spent on reverse engineering and reengineering, defect fixing and so on.

Continuous localization is close to a push-button process and in the mid- to long-term, we surely will see something that is push button. Content creators will be able to self-service and set up localization processes with assistance from localization teams. Content will flow directly to post-editors and back. Manual file hand-offs will cease, schedules will become redundant, QA and terminology processes will run independent of the translation process, reverse and reengineering will decrease or cease, bugs and hence bug-fixing should see a decline as localization friendly file formats and engineering methods are adopted. As localization teams on the buyer and seller sides gradually adopt such a fully automated process, the team setups will morph as well.

Since a lot of the ideas that we discuss are technology driven, we will see an expansion in technical teams on both the seller and the buyer sides. The traditional roles of project management, localization engineering, translation and localization QA will shrink. Buyers of localization services with strong technical teams may create their own localization tools, platforms and MT engines. However, a vast majority of buyers that do not have this expertise will increasingly rely on the sellers to fill this gap. Hence, sellers will need to adapt into strong technology companies. They will have to go beyond the traditional localization engineering that involved scripting and automating small pieces and evolve into technology experts dabbling in big data, machine learning, neural networks, platforms, APIs and so on. Sellers that were technology service providers and expanded into the localization business may have an edge depending on how quickly they pivot their strategy and decide to invest into localization technology.

An interesting idea could be a pan-industry effort to create an open source localization platform. There have been similar efforts in the past on linguistic quality processes or other parts of the process but this would be larger and more beneficial. Any and every buyer and seller could contribute as well as use the platform. They could contribute by creating APIs and micro-services for doing a variety of tasks such as globalization checks, process configuration, segmentation, machine translation, terminology extraction, text-to-speech, speech-to-text, QA automation, business intelligence, analytics and more.

How far is this future? Well, the entire process is extremely resource and technology intensive. Many companies will require a lot of time to adopt these practices. This provides the perfect opportunity for sellers to spruce up their offering and provide an automated digital localization solution. Companies with access to abundant resources or funding should be able to achieve this sooner. This is also why a pan-industry open source platform may accelerate this transformation.