SPONSORED: Artificial Intelligence (AI) is the biggest technology breakthrough of our times, and in its most accessible form as machine learning, is a disruptor for many sectors across the whole spectrum of manufacturing and services. AI changes fundamentally how we humans interact in our roles as consumers, citizens, patients, and passengers with businesses, devices, robots, vehicles, governments, health systems, networks, machines, and…. with each other. One field that AI has already deeply penetrated is the language industry, but its potential and limits have not yet been fully defined. TAUS calls this new subsector of the translation industry: “language data for AI (LD4AI)”.
The LD4AI industry is a branch of both the much larger AI industry and the global language services industry. In the AI industry, researchers know that they can only do so much with algorithms and parameter settings. Again and again, they see that data outperform the models. They can’t do without the humans in the loop interpreting, annotating, and validating the data.
Emerging Business Opportunities for Traditional Language Industry
While the need for human translation is changing, the urge among language service providers to diversify their service portfolio is on the rise. The innovators in the translation industry will grasp these opportunities.
TAUS was an early mover in the language data sub-sector with the launch of the TAUS Data Cloud in 2008. The TAUS Data Cloud allowed users to upload data and earn credits to download other users’ data. This reciprocal business model served many of the early users of statistical MT very well. The recipe was simple: more data was always better data. However, as the technology has evolved into neural networks, and users have become much more sophisticated, the old reciprocal model of the Data Cloud no longer sufficed. Users now need more domain-specific, high-quality data, customization, even personalization. They want to track the origin of the data and access new languages with human-in-the-loop services.
To provide transparency around where the data is coming from, marketplaces where data owners can showcase their datasets to potential buyers started to emerge. One of the most prominent examples of marketplaces is TAUS Data Marketplace, the language data monetization and acquisition platform. It’s a platform where (freelance) translators and LSPs who have produced stock data over time can generate great business value through the marketplace. On an ethical level, translators and linguists use the marketplace to put their (digitally underrepresented) languages in front of potential buyers (global service providers in sectors ranging from automotive to IT and health). We’ve gathered some Data Marketplace sellers’ motivations and success stories here.
We now have enough evidence to see that the translation industry is in the midst of a paradigm shift. Computer-aided translation is making space for a data-first approach. The new focus is adding value to the data ahead of its modeling (front-loading the machine), rather than editing language after it is machine-processed.
As language data in an AI framework grows in importance, it seems inevitable that the role of human intervention will change. So how can we best use humans in the loop to add value to these data? On the one hand, humans will become more closely involved in checking, evaluating, and augmenting language content in AI workflows, as we have seen in the previous chapters. On the other hand, the profile of these human activities in the language industry will significantly evolve.
To deliver all these language services, data providers typically create crowdsourcing platforms and apps where all types of data can be generated and evaluated to deliver appropriate training output. These platforms, in turn, are creating new communities of human resources – or ‘crowds’ – that play a new role within the translation industry economy which we can describe as “cultural professional.” The skill set of this new role extends beyond language alone into the entire media universe found across cultures. By contributing knowledgeable annotations and evaluations to data training tasks, whether operating on voices, faces, images, truth judgments, or language, these contributors become “wide-band” cultural professionals for the part of the world or community they know best.
Crowd-based platforms are a new way to put humans in the spotlight. These models specifically put the humans at the center of the whole AI development. In order for these models to be developed, humans have to generate, or annotate the data to train them. One exemplary platform is TAUS HLP where communities of native speakers are engaged to build datasets in a variety of high-demand domains, helping businesses reach their users in high-growth markets.
3. Data Optimization Services
Yet, sourcing the data is only the beginning. It is now understood that having high volumes of data does not simply solve the problem. Data needs to be optimized through cleaning, anonymization, quality review, annotation and clustering services to meet the requirements of each ML project.
TAUS provides on-demand data transformation services and off-the-shelf or custom data solutions for all AI projects. Using optimized datasets in any ML or AI application increases the output quality greatly. You can learn from these case studies how domain-specific, optimized datasets cause MT output to perform better.
When it comes to machine translation, data is what defines the quality. If you’re in need of parallel language data, customized data sets or any data-related services, please contact TAUS and let us help you optimize your business.