Language Technology in the Year 2022

Big products, great ideas, but still a lack of common standards

BY Arthur Wetzel

The Nimdzi Language Technology Atlas 2022 comprehensively assesses the language-technology landscape, as it does every year. It is a comprehensive, free resource that maps hundreds of language-technology solutions worldwide. This Atlas compiles exhaustive information that Nimdzi uses to guide its clientele through complex technology selection, deployment, and migration processes. For most people in the industry, this Atlas is essential.

As the consolidation of technology and service providers in the industry goes on, investors consult it to obtain a deeper grasp of the dominant market participants. Linguists and purchasers of language services use it to determine resources available to aid them in their daily work. Furthermore, it enables students in language programs throughout the globe to learn tools aiding their future occupations. Overall, the Atlas is a constantly updated tracking resource.

If you consider multilingual content as one of the strong trends within the industry, certain needs for technological processes are vital. Looking at current models in the industry, we may ask ourselves: Do we as an industry have all tools in place? Are you able to compile the right products to deliver multilingual content’s most important features? Yes, you may when it comes to interoperability standards (i.e. TAPICC) and the question if there is there reliable common sense regarding our industry’s infrastructure. The answer is no.

As Klaus Fleischmann, ceo of Vienna-based language tech company Kaleidoscope, once stated, translation as a sole business model fades out, and technology-driven multilingual content creation is emerging as a meta trend, touching all areas of our industry, regardless of format or source type. If we look at systems in need for this mission, the language industry this year points into the right direction.

The language localization industry tends to be very self-focused. It is common within this industry that customers need to understand and appreciate the complexity of what they do. In the end, multilingual solutions should plug in seamlessly into the more general global content process. For the customers, localization should best be invisible. There should be clearly defined touchpoints to enter it and exit it. Certainly the complexity should be taken care of by the actors in the industry, but we should not bother our customers so much with it.

We see those problems in the following review of Nimdzi’s 2022 Atlas, including a description of the methods underlying its compilation and the most significant technological shifts, as well as a thorough examination of trends, obstacles, and their respective solutions.

Advertisement

Methodology and New Alterations

This year’s Nimdzi Language Technology Atlas includes data from 800 technology suppliers. The data gathering behind the Atlas is based on four primary sources: ongoing research and ad-hoc requests, 50+ briefings and meetings with language technology firms, public data gathering, and experience of the research team members who regularly use and evaluate various language tools. These sources have given us a comprehensive understanding of the industry’s technology development state.

The Atlas covers several categories, the first being translation management systems that combine translation (and editing) environments with project-management elements. The second category mentioned in the Atlas is translation business-management systems. Translation business-management systems aren’t multilingual like TMS. Only translation project management is available. Such technology is called a bms or (t)bms, since it manages to build translation-related businesses.

Then comes the category of audio-visual translation tools, where we provide project and asset management tools and AI-enhanced dubbing solutions for audio-visual translation. Next, Atlas shows how machine translation is a strong engine brand. Integrators are also discussed that enable smart access to MT engines. Nimdzi supports MT processes so people may utilize it effectively.

Following that, the Atlas mentioned marketplaces/platforms. This section features translation platforms and markets. It also says localizers can help the clients advertise a job in a marketplace and receive linguists’ answers.

For systems interpretation, Nimdzi invented the term “virtual interpreting technology” (VIT) to characterize any virtual interpreting technology. There are three approaches to accomplish virtual interpreting: over-the-phone interpreting (OPI), remote video interpreting (VRI), or remote simultaneous interpreting (RSI).

The section on translation-quality management has three divisions: QA tools, review and evaluation tools, and terminology-management tools. The last category is speech recognition (ASR) which focuses on automated transcription/captioning.

Technology supports every step of the translation process. Copyright Kaleidoscope GmbH (AT) Klaus Fleischmann

Innovations, NLP, and blockchain

Some companies use blockchain, peer-to-peer payment tokens, and other concepts from the world of big it. As the technology landscape evolves, so does our thinking about the areas of focus for the Atlas.

So far there have only ever been three disruptive innovations in the language industry: email, translation-memory software, and MT.

What innovations can we identify?

Natural language processing

The Atlas aims to investigate applied NLP solutions that relate to and expand localization capability. We concentrate on multilingual education and applications that stimulate the expansion of new markets. This section of the Atlas highlights a few solutions that have struck notice, including Datasaur, Defined.AI, Lexalytics, NLP Cloud Playground, Telus, and the BLOOM model.

Localization and associated technologies are extending their bounds. Both multinational corporations with a significant localized presence and localization service providers are concentrating on new NLP application possibilities. In organizations where localization drives content strategy, localized experiences are not an afterthought but an integral part of the game. Even if the number of NLP applications in localization is small, we will continue monitoring NLP platforms and technologies.

Blockchain

In addition to Translateme, which we included in the 2021 Atlas and highlighted in the data related to Nimdzi 100, it’s crucial to consider Exfluency while addressing blockchain. Exfluency is a relatively new system built on the concept of privacy by design with the idea of creating a space for a secure multilingual asset store.

Exfluency offers two levels of anonymization (one for GDPR compliance, and another for anonymizing specific data following customers’ requirements). They now have a community of more than 1,000 users.

Advertisement

Language technology trends: TMS

According to the report, 92 percent of TMS managers say their companies use a commercial TMS or one that had been developed in-house. This year, the Atlas references over 160 different TMS solutions, 10 more than last year. This means that regardless of the options, new solutions continue to emerge.

Checking on the performance of specific companies, it is to be concluded that:

  • LILT continues to produce fully managed, human-powered AI-assisted translation services. The start-up aims to facilitate the “entire customer journey” rather than a specific service or TMS technology.
  • According to Nimdzi’s 2022 TMS Survey, MemoQ, and Memsource are the most popular brands. Memsource receives the most significant percentage of positive and the lowest rate of negative feedback.
  • RWS, which has now incorporated SDL’s suite of technology products, estimated around GBP 100 million/USD 125 million in revenue from technology-enabled services by the end of 2021. Trados Enterprise is a straight substitute for RWS Language Cloud in our Atlas.
  • MotionPoint and Smartling are two exciting players to keep an eye on. The former started as a pure-play tech provider and has been branching out to provide language services. Each started their trajectories on opposite ends of the spectrum, which speaks to a trend to watch out for.
  • Smartcat is expanding its in-context preview features (ms Word, subtitles, html) and integrating with Figma. Smartcat will be more accessible to non-localization and TMS users.
  • In 2022, Wordbee demoed their experiments around InDesign, dubbed Wordbee Link for InDesign. A native InDesign multilingual solution has automated layout cloning, with source and target designs remaining fully independent. It offers live updates, concurrent workflow, and a whole host of other new features.
  • XTM Cloud developed a query-management module to enable users to generate and access all queries without leaving the TMS . They’ve devised an algorithm that returns up to 30% more TM matches than the industry norm, saving time and money.

In 2022, the most common challenges in selecting a TMS will be connectivity, compliance with GDPR, HIPA, ISO, PCI, and other standards and protocols, and security. MemoQ, Memsource, RWS, XTM, and Smartling have all improved brand recognition over the previous 10 years, according to the Nimdzi survey. For new TMS developers, achieving top-tier mindshare may be difficult.

Terminology advancements and quality management

New capabilities for lexiQA include QA-as-you-type and review capabilities. An API-accessible LQA method eliminates most of the human labor required for quality assessment. LocalyzerQA, a new addition to our review and evaluation section, automates the linguistic review of online, mobile, and desktop apps in context. Having mapped all error classes onto the mqm model, lexiQA can apply severity weighting to each mistake type, and automated QA checks provide scorecards depending on the requirements stated. This may be followed by all-time champ Quickterm (Kaleidoscope) and Congree.

Data for localization

Last year, we analyzed AI and AI localization data. The AI leader Appen created adap (Appen’s Data Annotation Platform), allowing the world’s most significant MT, ASR, and nlu solutions to buy and annotate training data. Appen’s proprietary tools include Ontology Studio, which creates multilingual ontologies for search relevancy and recommendation algorithms. Synthesis can replace crowd-sourced linguistic data. Synthetic data is made to avoid real-world data constraints. It’s cheap, has no personal details, and other bonuses, with 1% of market data synthesized altogether. According to Gartner, the data industry will be $1.15 billion by 2027. (cagr 48%)

2021

92 % of TMS managers say their companies use a commercial
TMS  or one that had been developed in-house.

2022

92 % of TMS managers say their companies use a commercial
TMS or one that had been developed in-house.the Atlas references over 160 different TMS solutions, 10 more than
last year.

Synthetic voices and AI-enhanced dubbing

Continuing the subject of synthetic data, the Atlas discusses the rapidly developing arena of synthetic voices and “AI dubbing.” Synthetic voices are also used in eLearning, educational materials, broadcasting, and advertising. This includes Voiseed, which was already on our radar last year in the “AI-enhanced dubbing tools” subcategory of the Atlas. New additions to this category include: Aloud by Google; Dubverse; the Abena AI app for Android called “Africa’s first hands-free offline voice assistant;” The Common Voice project by Mozilla Common Voice; auto-dubbing solution Klling by the AI startup KLleon from Singapore; Klone; and Lava.

Amazon, Microsoft, and Meta

Here are a few breakthroughs from 2021 and 2022 from Microsoft, Amazon, and Meta: Microsoft Translator translated 100 languages in October 2021. Meta revealed NLLB-200, an AI model that can translate between 200 languages, in July 2022. Amazon announced another “language barrier break” in July 2022. Amazon Transcribe, Amazon Translate, and Amazon Polly were combined to create a near-real-time speech-to-speech system that translates a source speaker’s live-voice input into a spoken target language with no ML knowledge.

Machine translation

Regarding machine translation, Atlas found that the adaptive MT engines can learn from corrections in real-time. Known examples of such engines are LILT and ModernMT. Asynchronous retraining of a custom engine (e.g., daily) with qualified data can be more efficient compared to real-time learning based on unverified content.

New generation MT glossary

Nimdzi is already seeing changes to the search-and-replace functionality approach. For example, Google’s MT engine, DeepL, launched a glossary feature in May 2020 that allows users to define and enforce custom terminology. MT engines are expected to improve, enabling everyone to use glossary terms with morphologically correct inflections.

The company also launched new language models that more accurately convey the meaning of translated sentences.

TAUS Data-Enhanced MT

In 2022, a new TAUS service was launched: TAUS Data-Enhanced Machine Translation (DEMT). TAUS DEMT delivers affordable, customizable, high-quality MT output with a single click. The prices are 50% to 80% lower than the “human-in-the-loop” service. When it comes to TAUS, we need to refer also to their newly established marketplace with the fresh and highly effective corporas by Lexicala.

Fuzzy matches in MT engines

The report finds that the MateCat and Memsource’s MTQE feature helps users automatically estimate the quality of MT, especially if there’s no match from the TM. Anything below an 85% fuzzy match in Romance languages is potentially better-handled by MT than by TM. Users can decide how to represent MT as a fuzzy match.

Tone of voice

Paying attention to the tone of voice helps a text appear more aligned and human. In translation, this is especially useful for conversational scenarios in languages where the tone of formality matters. DeepL already has a trigger for formal/informal settings and features native tone-of-voice control.

A special focus on virtual interpreting technology

Remote interpreting has come out of the shadows to become the key to business continuity and care in many industries. While these platforms been used for a long time, the onset of the pandemic drastically increased demand, and ever since, innovation and investment in this field have been unstoppable.

The adoption of VRI and RSI

Remote interpreting’s rise has helped mainstream interpreting. Vaccine centers around the US are equipped with portable, on-demand video remote interpreting (VRI) devices. This tendency is also evident in RSO. Since the pandemic began, RSI has gained new customers and is no longer confined to conference interpretation (its field of origin). RSI has expanded into other markets and seems omnipresent nowadays.

Conference interpreters were forced mainly to adopt RSI during the COVID-19 pandemic. The audio quality of remote speakers is a major thorn in the foot of RSI interpreters. Consultant interpreters and interpreters with their clients tend to prefer being in the initial meeting when performing RSI. Some interpreters like to just log onto the RSI platform and do their thing — that is, interpreting — without having to worry about secondary devices, sound mixers, setting up the meeting for participants, and booth channels for interpreters. For the interpreters who prefer this scenario, a standalone platform is probably the best bet because there is no interaction with the client.

Innovation in virtual interpreting and multilingual meetings

Last year, we included Zoom in our Language Technology Atlas for the first time. Webex by Cisco and the Google Meet simultaneous interpreting add-on are new additions to this year’s Atlas. After adding a relay to its simultaneous interpreting function in the spring of 2022, Zoom may have increased its appeal to both customers and translators.

UN research showed remote interpreting to be more stressful than an on-site job. Stress is heightened when interpreters can’t see each other. To overcome this issue, some systems give video and audio to interpreters so they can see each other. This reassures interpreters that their partner is present and may facilitate hand gestures. Interpreters are invisible to meeting attendees, but booth partners may see each other. So interpreters keep the seclusion of the booth while maintaining eye contact.

Up until this point, most RSI solutions are focused on scheduled events. Tech providers in the VIT space have started recognizing the potential of bringing RSI to live-streamed events. The main challenge is ensuring that the original audio and video and the interpreter’s audio output are well synchronized. Akkadu, a China-based RSI platform, uses RTMP technology to synchronize the different audio and video feeds so audiences get the whole experience. To further improve its live-streaming capabilities, Akkadu has recently released its video player. This live-streaming video player can be embedded into a client’s webpage using an iFrame.

The need for a multilingual meeting provider (MMP) is growing, says Nimdzi. Many companies already describe themselves as facilitating multilingual meetings — and they do. Interviews with market players show that clients’ needs are shifting, and buyers are increasingly looking for a provider that can do it all. Nimdzi has identified two solutions that may provide an answer to this question. Bridge gcs is a virtual events platform that enables immersive and interactive experiences. The second is vSpeeq, which focuses on facilitating multilingual events.

The Atlas concludes that there is significant interest in the language-technology sector from a larger (than usual) audience, ranging from average consumers to major investors. Thanks to the emergence of AI, NLP, and MT, language technology is no longer seen as an afterthought in the language sector. This trend can be considered indicative of the ever-increasing interest in the arena. As visibility is essential to informed decision-making, we are glad to see this increasing popularity of the subject.

At last, the researchers involved in tracking down the 2022 Atlas ask the readers to join forces to properly follow how the language-technology landscape evolves in the years to come.

Today, we may call all those highly individualized idea givers and idea keepers self-defined evangelists. So to those lang-tech nerds, perfectionists, and sole developers, those mid-sized and also big-sized tech companies: Come together, collaborate by means of strategic partnerships and mergers, and strengthen your position within the industry while pursuing a common goal.

Arthur Wetzel is an entrepreneurial leader with a long career of executing digital product and branding strategies to accelerate growth.

Advertisement

RELATED ARTICLES

WEEKLY DIGEST

Subscribe to stay updated between magazine issues.