AI

Advertisement

MIT CSAIL and Reviving Lost Languages

AI, Technology

Can the evolution of language inform machine translation models for extinct languages? Researchers at CSAIL think so. Jean-Francois Champollion did too.

If not for ancient Greek and Coptic – a descendant of ancient Egyptian – the decades-long effort to crack the Rosetta Stone could have turned to centuries. For dead languages with few or no existing descendants, the task would appear impossible. Machine translation could help.

A project at MIT has been evolving throughout the past decade, as researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have sought to develop a system that can automatically decipher lost languages, even with scarce resources and an absence of related languages.

The team made headway in 2010 when Regina Barzilay, a professor at MIT, alongside Benjamin Snyder and Kevin Knight, developed an effective automatic translation method from the dead language Ugaritic into Hebrew. However, the more recent study considered this breakthrough relatively limited, since both languages are derived from the same proto-Semitic origin. Furthermore, they found the approach too customized and unable to work at scale.

To build off their initial findings, Barzilay and Jiaming Luo, a PHD student at MIT, have proposed a model that accounts for several linguistic constraints, particularly “patterns in language change documented in historical linguistics.”

One grounding principle here is that most human languages evolve in predictable ways. This would account for linguistic patterns where descendant languages rarely make drastic changes to sounds. A plosive “t” sound in a parent language, could feasibly change to a “d” sound, but would very seldom evolve into fricatives like “h” and “s” sounds.

Along with these constraints, another notable detail here is history. As the algorithm deciphers patterns in sounds and syntax, it will also pull from encyclopedic data to fill in some of the blanks.

“For instance, we may identify all the references to people or locations in the document which can then be further investigated in light of the known historical evidence,” Barzilay told MIT News. “These methods of ‘entity recognition’ are commonly used in various text processing applications today and are highly accurate, but the key research question is whether the task is feasible without any training data in the ancient language.”

While imperfect, these methods have so far made progress. The team found the algorithm could identify language families, and one instance corroborated earlier findings that Basque – a language spoken in a region of northern Spain and southwestern France – appeared too distinct to assume any linguistic relation.

The team hopes eventually to develop a method of automatically identifying the semantic meaning of words with or without a linguistic relation. Like the linguists who cracked the Rosetta Stone, CSAIL researchers could be on the verge of a paradigm shift.

Tags:, , ,
+ posts

MultiLingual creates go-to news and resources for language industry professionals.

Advertisement
SDL Tados 2021

Related News:

Advertisement

Straker Wins Major Contract with IBM

AI, Business News, Localization

Shares of New-Zealand based Straker Translations (ASX.STG) jumped almost 45% today — November 11 in New Zealand — on the announcement of a strategic two-year agreement with IBM starting in January 2021.

Straker’s AI-based RAY platform runs on IBM Cloud, and integrates seamlessly with IBM’s technology platforms. It outperformed other technologies that were considered in the selection process. Of particular note is the ability to take on IBM’s global media localization to provide multimedia content in 30 languages.

The localization company already provided localization services into Spanish, and will now expand its portfolio to 55 languages in support of IBM cloud services, IBM adaptive translations, and IBM global media localization. Volumes have not been disclosed, but Straker expects significant growth in revenue and a 30% increase in headcount to handle the new languages.

“This agreement is a recognition of the outstanding capabilities of our technology to handle a large volume of translation that is currently managed internally at IBM. Our talented team will be able to achieve major productivity gains with AI-powered RAY platform,” Straker CEO and co-founder Grant Straker told MultiLingual.

After IBM announced last month that it was restructuring by spinning out its infrastructure services business, IBM CEO Arvind Krishna made it clear that his focus is going to be on transforming the organization into a hybrid cloud management vendor. This is certainly a good sign for Straker and its shareholders.

 

 

Tags:, ,
+ posts

Katie Botkin, Editor-in-Chief at MultiLingual, has a background in linguistics and journalism. She began publishing "multilingual" newsletters at the age of 15, and went on to invest her college and post-graduate career in language learning, teaching and writing. She has extensive experience with niche American microcultures across the political spectrum.

Advertisement
Weglot

Related News:

Resemble.ai Launches AI Tech That Mimics User Voice

AI

The new voice localization AI from Resemble.ai will translate the user’s voice across languages, initially supporting English, French, German, Dutch, Italian, and Spanish.

Resemble.ai, which works on generative deep learning voice technology, recently announced that it has created Localize, a voice AI technology that localizes speech. Generally speaking, entertainment companies, ad agencies, call centers, and companies that need to translate voices use a different dubbed voice in each language. According to Resemble.ai, however, user voices will carry into any language with Localize, meaning the speaker’s voice will remain consistent even when translated.

Resemble.ai claims to clone voices at scale in seconds, as opposed to weeks. It has taken the previously laborious, expensive process and cloned 42,000 voices for 65,000 users, including two of the largest global telecoms, two of the largest consulting companies, a top global broadcasting company, two of the largest entertainment conglomerates, one of the largest toy makers, and the leader in airport communications systems.

Localize will be compatible with video games, movies, call centers, company videos, and more as they are translated to and from languages including English, French, German, Dutch, Italian, and Spanish, with upcoming plans to introduce Localize for Korean, Japanese and Mandarin.

Normally, voice translation takes an average of two months and can cost companies hundreds of thousands of dollars. For entertainment companies, dubbing a script is logistically challenging and the fidelity of the production is oftentimes lost in translation. This new voice technology aims to accomplish the equivalent volume in a week with maximum creative flexibility and efficiency.

“It’s hard to overstate how important audio has become in recent years — or just how much bigger it’s going to get in an AirPods-first world,” said Peter Rojas, partner at Betaworks Ventures. “Synthetic voice is going to be key to all this by transforming how audio is created. Demand for localized and translated spoken word content, whether it’s in the form of podcasts or audiobooks, is exploding, and AI-based tools like Localize are the way to satisfy that demand.”

Tags:, ,
+ posts

MultiLingual creates go-to news and resources for language industry professionals.

Related News:

GreenKey Creates NLP Tool for Hedge Funds

AI

Focus Studio, the new application from GreenKey, will provide users with natural language processing workflows specific to hedge fund management.

Bank sales teams often turn to natural language processing (NLP) to find client insights — such as OTC quotes and trades — within emails, direct messages, and phone calls and manage increasing amounts of conversational data. This type of computational power is theoretically possible cross-linguistically as well, which has interesting implications for the language services industry. Specifically, the types of text processed might require unique needs to generate trade ideas. Addressing a specific need for hedge funds, GreenKey, creator of natural language processing (NLP) workflows for sales and trading, has released its latest version of the “Focus Studio” application.

Users of Focus Studio can customize NLP to go through various files and deliver highlighted insights as daily reports or power real-time automation, such as chatbots. This latest version of Focus Studio now includes NLP models designed specifically for hedge funds to help them cope with the amount of unstructured text they process.

Based in Chicago with offices in New York and London, GreenKey is the creator of a patented speech recognition (ASR) and NLP platform that recognizes complex jargon across real-time audio and text sources and transforms them into actionable insights. GreenKey converts disparate communications streams into structured data tools that help banks, trading firms, and emergency services operators automate complex workflows.

GreenKey trains the new NLP models on real sell-side human analysts to capture their insights and include the ability to rapidly customize those models through a quick annotation process. Traders will select from the base models called “trusted curators” and can even ask their favorite sell-side research analyst to create and contribute one. The custom model collection can be fed thousands of documents and will identify trending topics, intents, entities, and can even provide innovative raw sentiment scores such as “word disfluency.” The pre-trained models also include in-depth product knowledge across global fixed income, credit, equities, FX, and commodity markets.

“NLP is already changing the way sales and trading occurs on the sell-side, enabling a wave of automation and insight generation across various workflows,” said GreenKey Founder and CEO Anthony Tassone. “Now the buy-side can begin to leverage NLP to automate and scale their analysis, while retaining the ‘trusted curator’ role of the sell-side research provider and analyst.”

Tags:, , ,
+ posts

MultiLingual creates go-to news and resources for language industry professionals.

Related News:

GlobalLink AI Portal Delivers Over 1 Billion Words Per Month

AI

Surpassing one billion words per month, TransPerfect’s GlobalLink AI Portal solution has doubled in usage this year, as TransPerfect signs on several new clients.

TransPerfect, one of the world’s largest providers of language and technology solutions for global business, announced this week that the adoption of its GlobalLink AI Portal machine translation (MT) solution for corporate clients has surpassed the one billion words per month milestone and continues to grow.

With the addition of new clients like Cushman & Wakefield, HARMAN International, and Cummins, GlobalLink AI Portal usage has more than doubled this year. Serving over 5,000 global organizations, GlobalLink Product Suite simplifies management of multilingual content. TransPerfect has seen long-term clients who use the service achieve an increase of 15% in quality scores over the previous year, demonstrating the AI solution’s capacity to continuously improve in a secure environment. The increased quality allows clients to reduce the scope and scale of necessary post-edits by streamlining time and cost.

Among TransPerfect’s numerous language and technology solutions, GlobalLink AI Portal focuses on real-time self-service MT and supports more than 40 different languages and 30 different file formats. The solution makes neural MT more accessible to corporate clients looking to integrate the technology into their business workflows. Offering a hybrid approach, TransPerfect combines AI and human translation to help clients achieve an optimal position on the quality-cost translation matrix for the content’s end use.

Furthermore, GlobalLink AI Portal offers unique security features that include the use of certified collocation facilities, encryption, secure HTTPS access, optional deactivation of data storage, single sign-on (SSO) integration, and user permissions and hierarchies.

TransPerfect President and CEO Phil Shawe said that “Efficiency and security are two of the pillars on which our company has operated for over 25 years. I’m happy to see the marketplace’s rapid adoption of our GlobalLink AI solution, but I’m even more pleased to know that we’ve delivered this technology in a way that both drives productivity and respects privacy.”

Tags:, , ,
+ posts

MultiLingual creates go-to news and resources for language industry professionals.

Related News:

Anuvadak Platform Translates India’s MyGov COVID Site

AI, Localization

The expansion of COVID-19 information now to include 10 of India’s 22 official languages will ensure millions of Indian internet users access to vital information. Reverie Language Technologies hopes this is just the beginning for Anuvadak, its website localization AI.

In the effort boost access to the internet among India’s multitude of languages, Reverie Language Technologies has leveraged its machine translation AI Anuvadak to automatically publish the MyGov COVID-19 page in ten Indian languages.

Launched as a platform to publish websites in Indian languages, Anuvadak accelerates the process of localizing content to better serve the needs of 536 million Indian-language internet users. Along with translating language using neural machine translation, the platform can also automatically update websites, manage workflows, and optimize SEO search results using built-in web analytics.

“It is a platform that accelerates the process of creating, launching, and optimizing your website in multiple languages,” said Reverie Language Technologies CEO and Co-founder Arvind Pani in a recent interview. “The platform enables you to connect with customers in their language with faster go-to-market and effortless content management. Anuvadak can scale down the website localization time by 40% and can save as much as 60% of the localization and content management costs.”

After winning the QPrize in 2011, Reverie Language Technologies became the first company to offer language computing solutions for all 22 official Indian languages. However, despite India’s claim to the world’s second-largest English-speaking population, only around 10% of the Indian population speak English. Accordingly, the vast number of internet resources serve a minority of Indian internet users.

As COVID-19 cases continue to rise, access to information one’s native language is still vital globally, and many around the world are calling for efforts to deliver information in a timely manner. The increased language capacity on India’s MyGov website will play a major role in disseminating life-saving information as epidemiologists gather new information about the virus. Furthermore, an internet with broader localization strategies will ensure Indian internet-users with more equal access to opportunities in business, education, and cultural exchange.

“We are focused on building products to address all user engagement aspects, be it input, search, voice, translation, or localization,” said Pani. “We plan to empower more number of the rapidly rising Indian-language users with our language by enabling large businesses and governments to connect with more people in regional languages.”

Although a great effort is still needed to deliver access to information through the internet and technology to India’s diverse language speakers, Anuvadak will contribute to the broader effort to serve Indian internet users.

Tags:, , ,
+ posts

MultiLingual creates go-to news and resources for language industry professionals.

Related News:

The GPT-3 AI Language Model Might Be on Your Newsfeed

AI

OpenAI’s GPT-3 was released privately for beta testing in July. The AI language model has already generated tens of thousands of views and dozens of subscribers on one beta test blog from many unwitting readers.

When OpenAI released its GPT-3 AI language model for beta testing last month, the San Francisco-based AI research and deployment company was aware that issues might arise. After all, following the racist, homophobic, misogynistic language generated by the GPT-2, the GPT-3 could easily fall into similar patterns.

To prepare for such outcomes, OpenAI decided to launch the language model in beta to limit its capacity to stray into problematic territory. Although the company released the beta primarily to university and industry researchers, one computer science major at University of California, Berkeley reached out to a PHD candidate to request access to GPT-3.

Once the graduate student agreed to collaborate, Liam Porr wrote a script for him to run. The script gave GPT-3 a headline and introduction for a blog post and ordered it to generate multiple completed versions. Porr then created Adolos, a blog he would use to test his hypothesis that the AI could convince an audience that the blog was written by a human.

Porr did as little as creating a title and introduction, choosing a photo, and copy-pasted from one of the outputs with little to no editing. After two weeks, the blog had over 26,000 visitors and 60 subscribers, with one post even making it to the number one spot on Hacker News. Furthermore, while a few readers suspected the posts had been written by GPT-3, many of those comments were subsequently down-voted by community members.

One of the tricks Porr discovered, which would allow the algorithm to function at a more convincingly human level, was choosing the right subject matter. Although the GPT-3 language model is far more vast than the GPT-2, it still struggles to produce language in a rational, logical way. Indeed, even OpenAI’s first use of the model was to write a poem about previous board member Elon Musk.

Focusing on subject matter that utilizes more emotional, creative language, Porr settled on productivity and self-help. He then searched through Medium and Hacker News articles to emulate titles related to those subjects and let the AI loose.

After conducting this two-week experiment, Porr wrote a post on his blog — without the help of GPT-3 — discussing his findings and the implications of OpenAI’s newest language model. With the promising efficiency of GPT-3, the model could have a major impact on the future of online media, according to Porr. The experiment comes during global discussions around the ethics of AI.

Tags:, ,
+ posts

MultiLingual creates go-to news and resources for language industry professionals.

Related News: