Tag: NMT


TED, SYSTRAN Partner, Create Multilingual NMT Models

Translation Technology

Beginning with ten languages, SYSTRAN will use TED content to develop neural machine translation models for technical content in a variety of fields.

AI-based translation technology company SYSTRAN announced recently its new partnership with TED to build specialized neural translation models that are based on high-quality translations of TED Talks. These unique models are designed to meet the sophisticated translation needs of multinational companies, educational institutions, government agencies and other organizations by enabling accurate and fluent translations of learning, scientific, business, and technical content in ten languages.

A nonprofit organization whose slogan is “Ideas Worth Spreading,” TED has committed to global language access as one of its core foundations. Organizations in 150 countries participate in the TEDx initiative, which allows groups to apply for licenses to organize conferences made up of local participants, ranging from professors to scientists to writers.

Along with TEDx, the company currently has a major translation initiative of their online resources, with a team of over 35,000 human translators, who have produced almost 175,000 translations and captions in 115 languages. The data from this major cache of language resources will likely enable SYSTRAN to expand their neural translation models to even more languages as well.

“SYSTRAN is TED’s first-ever authorized partner in bringing together TED content and machine learning to develop a commercial product,” said Alex Hofmann, Director, Global Distribution & Licensing at TED. “The fact that our inaugural collaboration in the AI space is focused on neural machine translation models built from translations of TED Talks in multiple languages feels natural and are now available on a licensed basis to help enterprises and organizations meet their most sophisticated translation needs.”

The proprietary models are developed by SYSTRAN, pairing TED’s unique multilingual data and SYSTRAN’s AI expertise, and are an early step in advancing data usage in wider applications. TED requires a license for authorized use of its data for commercial AI and machine learning purposes, and SYSTRAN is the first to obtain such a license. In accordance with SYSTRAN’s core principles of security and data privacy, TED fully preserves its intellectual property and ownership of its data as well as the specialized models. The TED-owned models are available on the SYSTRAN Marketplace, a catalog of specialized models for specific domains such as legal, finance, health, education, science/technology and many more.

“This strategic partnership is about taking our shared goals of connecting people and cultures and facilitating multilingual engagement globally,” said CIO of SYSTRAN, John Paul Barraza. “The human-created translations generated by the TED Translator community are of the highest quality, enabling SYSTRAN to build accurate and fluent translation models for use across a plethora of business and professional applications.”

SYSTRAN conducted double-blind human evaluations on the TED models it built, and the results show improvements in accuracy and fluency over baseline state-of-the-art generic models. The human evaluations also revealed unexpected results, with 41% of the models scoring higher than the human reference translations.

“The current global situation is showing us how inter-connected the different countries and populations worldwide are. Companies are imagining a world with far less boundaries — starting with the way we communicate,” said Jean Senellart, SYSTRAN CEO. “Introducing models to the SYSTRAN Marketplace is an incredible opportunity and will respond to real needs in the translation of educational, business, scientific, and technical materials.”

Tags:, ,
+ posts

MultiLingual creates go-to news and resources for language industry professionals.


Related News:


In Technology, Things Never Slow Down

Translation Technology

I am of an age where I can recall the pre-email intra- and extra-office communications process. Both were served by what is now called snail mail. External communications got a stamp and were posted at the end of every day. Internal communications involved a dedicated person traveling around the offices handing out memos in large brown envelopes tied with string. If you were on the recipients’ list, the brown envelop was placed in your In-tray by the office postal clerk for you to peruse, at your leisure.

Once you had read the contents, and added a note of comment to them, they were then placed in your Out-tray. You ticked off your signature to show that you had read the contents. The brown envelop was then moved on by the clerk who spent his day walking the corridors to carry out this vital task.

You know what, the process worked — albeit at the pace of a crocked snail. But that’s how the world was back then. People neither expected nor demand things to be addressed immediately.

Then some office IT genius spotted this new technological advancement that was sweeping the world. It was called email. The technology was duly introduced and we all received training on how this new-fangled invention worked. The old brown envelopes disappeared and the postal clerk put on a lot of weight from lack of exercise. But for worse (or for better?), the pace of work in the office was ramped up immeasurably. Suddenly messages were being received in your electronic In-tray and expectations grew that a message received should be answered immediately, if not sooner. Decision-making became a nanosecond exercise.

Indeed, people sitting only feet from you would “ping” (that was a new word for us) an email to you, rather than simply shout across the office or talk to you over the water cooler. The introduction of this new technological changed the face and the pace of every office. It put it in to an overdrive that it never really decelerated from. I tell you this “All of Our Yesterdays” anecdote by way of demonstrating to you how technology begets a change that is often one of speeding up processes. Seldom does new technology aim to slow things down.

This speeding up is being driven by the constant evolution and improvement in the capacity of computers to crunch and process data. As the physical hardware gains more computational power, with super processing chips, that power is used to process and spit out huge corpora of data at breathtakingly fast speeds. But even this power is not proving sufficient as companies hunger for faster and cheaper solutions to their growing need to process huge amounts of data at almost real-time speeds. Already research is at an advanced stage whereby the silicon chip will be replaced by a new technology called the carbon nanotube. And on and on it will go.

Neural machine translation evolution

The evolution of neural machine translation (NMT) too has been evolving at a breakneck pace. NMT development has moved at five times the pace of earlier statistical machine translation (SMT) research, and the developments in industry bear this out. Google replaced a system they had developed over the course of 12 years with a new NMT system they developed in just over 18 months. With these developments comes the improvement of outcomes and capabilities. The rapid evolution of NMT has been served by the huge amount of time and effort being put in to research by many of the giants of industry. This factor, married to the development of faster and affordable hardware, has facilitated the ongoing demands for more speed and computational power. Google is working with a start-up company called Nervana Systems that is developing the Nervana Engine, an ASIC processor that increase current processing speeds by a factor of 10. Not surprisingly, Nervana Systems was bought by Intel in 2016.

It is no surprise that NMT, which is a model inspired by the workings of the human brain, is greedy for the speedy processing of huge corpora of complex data. And it is a sobering thought that the average human brain processes data at 30 times the speed of the best supercomputers. Fortunately, with the advance of Deep Learning, SMT requires only a fraction of the memory needed for traditional SMT. Whereby Email was demanded because the world needed to speed up inter- and extra-office communications, the development of NMT is being driven by the proliferation of mobile devices, in-home control systems, the rise of social media and the demand for real-time communications, the growth of e-commerce as a market opportunity for companies and the growth of Big Data and its insatiable appetite to crunch and understand huge amounts of data now, in multiple languages and at an affordable cost.

The adoption of NMT by behemoths such as Google has meant that this language solution has been given the blessing that it is a technology worthy of investment and research. And as is the way in industry once one giant adopts a system the other equally powerful entities feel the need to develop their systems. Facebook too has joined this race. Indeed, the top companies in the world, including Microsoft, Google, Amazon, eBay and Facebook to name but a few, have ongoing investment and research in NMT. With R&D spending prowess of these companies it is no wonder that the development of NMT has gathered such a pace. In fact, NMT is expected to surpass all other MT models and to grow to a market share of $46 billion by 2023.

The objective of NMT development is no small one. In essence, it can be defined as advancing a system that will allow people from anywhere in the world to be able to connect with anyone, and understand anything in their own language. Add to that the need for quality and speed and you can see the mountain NMT has to climb, and has been successfully climbing. Yet achievement of that objective is getting closer. Google, for example, supports 103 languages, it translates a 100 billion words per day (you read that right!) and communicates with the 92 percent of its users who are outside of the USA.

Those are staggering figures. But if companies want to grow their brands, open up fertile new markets and keep their shareholders happy, then these are the levels that must reach to keep pace with developments in NMT. And we are not only referring to the written word, for more and more of the demands are for the spoken word with the growth of voice activated technology and household “gadgets” such as Amazon’s Alexa, Google’s Home and Apple’s HomePod (and that list is growing). And the future of NMT is further being cemented by its adoption by key industries such as Military & Defence, IT, Electronics, Automotive and Healthcare to name just a few.

NMT has now been taken up by all serious language service providers (LSPs). The debate is ongoing as to how this will impact on the current LSP model. Undoubtedly, the role of the human translator is evolving to one of being an editor rather than translator. Pricing models are changing from the traditional price per word based on word volumes, to pricing on a time-measured rate. An expert at eBay has predicted that the traditional translator will evolve to become “… date curators of corpora for MT.” Our founder Tony O’Dowd has a bleaker assessment for the human translator when he says, “the traditional approach to translation is dead (or in its twilight zone).” But one thing seems sure, NMT — like email — is not going to go away. Speed is of the essence. That is the eternal watch-cry of technology.

Tags:, , , ,

Aidan Collins is a language industry veteran. He began his localization career as desktop publishing manager in Softrans (later bought by Berlitz) in 1991. In the following years, he has held senior management positions in both major LSPs and global technology companies. He is currently marketing manager with KantanMT.

Related News: