Cloud translation process, current and future

By Andrzej Zydroń December 17, 2013

Cloud translation is making an ever-increasing impact on the localization industry. This is not surprising as the cloud is a natural environment for localization — translation by its very nature is a collaborative process involving project managers, translators, reviewers and correctors. A centralized, coordinated system is a natural constituency for such a process.

The first generation of translation management systems (TMS) and computer assisted translation (CAT) systems that originated around the turn of the century were tied to the desktop. Based on the concept of a central server with desktop CAT tools, collaboration involved e-mailing files. However, this is both time consuming and error prone by its very nature, and also results in little islands of isolated data. It is difficult for individual actors in this scenario to share large memories and terminology files. For large projects involving many translators, inconsistencies in translation could and did easily result, with repeated text being translated differently by individual players.

There was also the problem of installing, maintaining and supporting desktop software as well as the issue of licenses. With many of these first-generation TMSs, translators were required to use a specific CAT tool, which meant they had to either purchase it themselves or install it with a license provided for the duration of the project.

The traditional approach by software publishers regarding TMS and CAT software is the sale of a one-off license. This usually includes free support for the first year, followed by 15% to 20% maintenance and support fees for subsequent years. In addition, it is not uncommon to withdraw support for the version after three years and force users to buy an upgrade to the latest and best version. There are many disadvantages to customers from this practice, the main one being a large one-off financial outlay.

Cloud translation systems involve a different approach — you pay monthly so there is no large initial financial outlay. Additionally, you can vary your licenses according to demand. No organization has a constant demand for translation. It is usually feast or famine: one month you are snowed under with work, the next, things can be very quiet. The ability to adjust your licenses means you pay for what you need. With the traditional TMS and CAT vendor approach, if you suddenly acquire a large project that requires 30 more licenses you would be forced to buy the additional licenses even though you will no longer need them in two months. With a cloud-based system, you should be able to vary your licenses from month to month or even week to week according to demand.

Thus, with the cloud approach, your TMS/CAT costs are a variable element of your business directly related to your sales, and not capital expenditure. In addition, as there is no software to install, you can be up and running with a system in minutes. All that is required is a browser and decent internet connection.

This is considering the license fees only, but in integrated computing architecture, the standard Windows PC is an expensive and inefficient tool to buy and maintain. This is not just from the perspective of any installed software but also from the aspect of backups, security and system maintenance. PCs are notoriously virus prone, and although on the face of it they appear to not require professional IT support, they are very complex to maintain. Just take the standard experience: after six to nine months of use, desktop software on Windows PCs often slows down significantly. This just gets worse over time, and the more software you install on a PC the worse things become. Every PC user has experienced it. Then there are the Trojans and viruses — antivirus software is by no means perfect at identifying viruses. From many perspectives, the Windows desktop is in terminal decline, overtaken increasingly by tablets and other devices. PCs are expensive to manage and maintain outside of a tightly controlled and centralized corporate organization. In an integrated collaborative environment, desktop-based software has many drawbacks and inefficiencies, and for this reason its time is quickly passing. The big advances in HTML standards and libraries have made the differences between individual browsers largely irrelevant, especially since the demise of Internet Explorer 6. The browser is now becoming the main tool in which we interact with centralized systems. Managing your banking and utility transactions as well as shopping via a browser is now commonplace. It is now also time to translate and manage translation online.

Cloud translation system design

The starting point of cloud system design should be standards — The OASIS Open Architecture for XML Authoring and Localization (OAXAL) initiative has produced some ideal standards-based templates for a cloud-based translation system. At the heart of OAXAL are many of the existing localization standards: TMX, SRX, W3C ITS, TBX, xml:tm, Unicode and XLIFF (plus XLIFF:doc and TIPP) as well as XML itself (Figure 1). Designing a cloud translation process from scratch provides the opportunity to implement all relevant localization industry open standards.

In a world where over 90% of data for translation is already being generated in XML, it makes sense to base the internal data structure on XML. Formats that are not in XML, such as FrameMaker, HTML or RTF, can be easily converted to and from XML. Having one consistent electronic form makes for a very clean, efficient and elegant design. It also allows for the creation of a data driven automation approach based on open standards, where you have only one extraction and matching process rather than one for each different file format.

The use of standards is also key: TMS and CAT tool publishers have not always been great at implementing and supporting standards. Take word counts as an example — the vast majority of tools each have their own proprietary way of counting words and characters. In fact, a major publisher after a recent acquisition had two CAT tools producing different word counts. In addition, one of the most widely adopted CAT tools had a tradition at one time of changing the word count methodology with every release. Unsurprisingly, the actual specifications of the proprietary counts are never published and thus cannot be verified. It is time for customers to end this nonsense by demanding support for GMX-V — the official standard for word and character counts.

Another aspect of the cloud translation process is the ability to interoperate with other TMS and CAT tools. Here again standards play a key role. The Linport initiative based on the XLIFF:doc and Translation Interchange Package Protocol from the Interoperability Now! initiative enables the seamless integration with other supporting TMS and CAT tools. The original XLIFF 1.2 standard allows for too many incompatible implementations to permit true interoperability, and it does not cater to reference materials and terminology in the exchange.

The server-based nature of cloud translation systems means it should feature scalable design. This is achieved using a service oriented architecture design where all of the individual components (analysis, extraction, translation memory management, terminology, quality assurance and spell checking, workflow and the actual translator workbench) are all implemented as individual web-based components. This provides an infinitely scalable approach where all individual components can be offloaded onto bigger and faster servers as the workload increases. Scalability must also include the ability to handle files and projects of arbitrary size. There should be no upper limits, and multiple actors such as translators, reviewers and correctors must be capable of working on the same files at the same time, all sharing translation assets such as terminology and translation memory in real time.

The cloud translation process must support flexible, customizable workflow management. This must include the ability to define your own workflow steps and the ability to inform all parties involved when individual steps are overdue. It is important to allow not only sequential workflow, but also concurrent steps such as reviewers and translators working in parallel. Transition from each workflow stage to the next must be automatic once the previous step has been completed.

Of course, there is little point in having a cloud translation system without the ability of translators, reviewers and correctors to work directly within a browser-based environment. The recent advances in HTML and available programming libraries have made cross-browser support and dynamic features available to allow the creation of a fully feature-rich, browser-based translation environment. The key is to be able to provide all of the features required, including TM matching, machine translation suggestions, navigation aids, quality assurance and so on. In other words, a fully functional environment for language professionals.

An online CAT tool must also allow multiple actors (project managers, translators, reviewers and correctors) to work on the same file at the same time, with everyone sharing translation assets such as terminology and TM even as it is being dynamically generated. Having a browser-based translation workbench means that all assets are held and available centrally in real time. Everything should be constantly backed up, so there are no software updates or incompatibilities to worry about. The user platform is irrelevant, so there is no reliance on one single work environment — the system can be accessed from a smartphone, tablet, Windows, Mac or Linux device.

Automation

The key to a well-designed cloud translation process is automation. A typical non-cloud process can incur large overhead costs (Figure 2). This is partly due to a considerable amount of manual intervention required to process a translation project without automation. There are many points of manual intervention, and as well as adding to the cost, this also represents a potential point of failure.

The cloud translation process provides for a fully automated environment, effectively eliminating all of the manual stages apart from the actual process of translation, review and correction — the ultimate intellectual transfer at the heart of the whole process, meaning the bit that you can actually relate to in terms of what you are paying for. Here all the processes in the green background represent totally automated processes (Figure 3). The only things that cannot be automated are the actual translation and review/correct stages that are done via the browser. In other words, everything is as automated as possible and centralized.

Other aspects of a cloud translation process are the integration with external systems such as workflow and CMS systems. Here the centralized and web-enabled nature of cloud translation systems makes seamless integration possible, such as the automatic transfer of data for translation once it is ready.

The future of translation is definitely going to be cloud-based as this provides the best fit for the collaborative nature of localization. For the developed world, at least, the main technical issues holding back adoption of the cloud as a translation process, such as browser technology and web standards, have now been largely resolved. The cloud translation process is here and it is soon going to dominate the industry, as the alternatives are much less efficient.