The Association for Machine Translation in the Americas has published the program for the 2022 conference. Papers at this year’s industry event represent innovations in machine translation – both in the science and the craft of implementation.
Paper topics for the AMTA 2022 cover MT evaluation, deep learning, data filtering and manipulation, low-resource models, MT impact on business, domain-specific MT, sign language MT, chat translation, search translation, non-referential models for MT quality forecasting, use cases for large language models, sustainability in MT, automatic dubbing and interpretation, and large-scale government deployments.
We list 9 highlights out of 64 papers that will be presented at the conference.
How to run a governance board for responsible AI
Paper: Machine Translation as a Prototype for Advanced AI Deployment in Government (Kathryn Baker, US Dept of Defense)
Kathryn Baker describes a real-world project where MT models based on open source were integrated for the Department of Defense. While open-source models and data are readily available on the Internet, government institutions must be very prudent and responsible in vetting them, consider the impact of using particular data sources on individuals’ privacy, and understand copyright issues associated with their data sets. The presentation covers how governance boards can be established for this purpose and proposes a review process for Publicly Available Information in AI models. Lessons learned can be applied to the rapid ramp-up of other NLP models.
Amazon’s Take on Search in E-Commerce
Paper: Improve MT for Search with Selected Translation Memory using Search Signals (Hang Zhang, Amazon)
E-Commerce search engines achieve multilingual search by machine translating search queries before searching the index in its primary language. Hang Zhang proposes a method of improving search results by checking translation memory entries first, including fuzzy matches when the TM entries are only sub-strings of a customer search query. The secret sauce is to select TM entries using search signals.
MT Analytics at VMware
Paper: Data Analytics Meet Machine Translation (Allen Che, VMware)
This technical paper describes the process of building a business intelligence framework for machine translation, starting from collecting the daily operation data, cleaning it, and building the analytics to get insights into MT quality. Allen Che will present how to build the data collecting matrix, the cleanup script, and an automation script for the analytics. The presentation includes different visualized reports, such as Box Polit, Standard Deviation, Mean, MT touchpoint, and golden ratio reports.
Singapore Government Launches MT
Paper: Singapore Translate Together
(Adeline Sim, Singapore’s Ministry of Communication)
This is a reveal of Singapore’s effort to provide and maintain MT in four official national languages: English, Chinese, Bahasa Melayu, and Tamil. Launched on 27 June 2022, the SG Translate Together (SGTT) web portal houses the engines and translation-related news. This presentation will briefly cover the methodologies adopted to build the NMT engines for Singapore local content related to Government communications, as well as the establishment of an incremental learning model leveraging the ensemble technique for further training with new data.
Working Machine Interpreting/Dubbing Has Arrived
Papers: A Multimodal Simultaneous Interpretation Prototype: Who Said What
(Xiaolin Wang, National Institute of Information and Communications Technology), and Lingua: Addressing Scenarios for Live Interpretation and Automatic Dubbing (Nathan Anderson, Carnegie Mellon University))
Two presentations cover the up-and-coming technology to machine interpret conferences, live meetings, and live streams.
Nathan Anderson (with co-author Steve Richardson, AMTA President) presents Lingua, an application developed for the Church of Jesus Christ of Latter-day Saints that performs both interpretation of live speeches and automatic video dubbing using a traditional ASR–MT–TTS pipeline. The app’s unique contribution is that it can also operate in real-time with a slight delay of a few seconds to interpret live speeches, with or without the assistance of a source language script, which can result in greater accuracy.
Xiaolin Wang from NICT in Japan focuses on attributing phrases in the video stream to speakers. While conventional simultaneous interpretation systems merely present “what was said’” in the form of subtitles, the proposed NICT multimodal system uses images to recognize the speakers of each sentence, and then annotates its translation with the textual tag and face icon of the speaker, so that users can quickly understand the scenario.
XSTS – A new human evaluation metric
Consistent Human Evaluation of Machine Translation across Language Pairs (Philipp Koehn, Johns Hopkins University)
Obtaining meaningful quality scores for machine translation systems through human evaluation remains a challenge given the high variability between human evaluators, partly due to subjective expectations for translation quality for different language pairs.
The research group that includes Philipp Koehn from Johns Hopkins University, and a META team with Francisco Guzman, Daniel Licht, Cynthia Gao, Janice Lam, and Mona Diab proposes a new metric to remedy that. XSTS is focused on semantic equivalence and a cross-lingual calibration method that enables more consistent assessment. The metric’s effectiveness is demonstrated in large-scale evaluation studies across up to 14 language pairs, with translation both into and out of English.
25% Faster Neural MT
Paper: Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU (Hossam Amer, Microsoft)
This scientific paper presents a way to increase the speed of neural machine translation via clustering for systems that use the transformer architecture on GPUs. Disjoint clusters have much smaller vocabulary columns for vocabulary projection. At the time of translation, the new method predicts the clusters and candidate active tokens for hidden context vectors during vocabulary projection. This paper also includes an analysis of different ways of building these clusters in multilingual settings.
Results: overall speedup of 25% while maintaining quality. The vocabulary projection step itself is 2.6x times faster.
MT to translate into languages it has never seen before
Paper: Adapting Large Multilingual Machine Translation Models to Unseen Low Resource Languages via Vocabulary Substitution and Neuron Selection (Mohamed A Abdelghaffar, German University in Cairo)
Can the NMT neural network translate into and from a language it has never seen before? With this paper, a research group in Cairo proposes a method to use large Multilingual Machine Translation models and their training data from high resource languages (HRLs) to understand low-resource languages that were not included in the model training. They use neuron-ranking analysis to select neurons that are most influential to the HRL and fine-tune only this subset of the deep neural network’s neurons. The group claims that their method improves on both zero-shot and the stronger baseline of directly fine-tuning the model on the low-resource data by up to 3 BLEU points. If successful, this method can significantly increase the availability of MT for very low resource languages.
MT Adoption in the Government of Canada
Hand in <Hand> with the Machine: A Roadmap to Quality (Caroline-Soledad Mallette, Translation Bureau)
The Government of Canada’s Translation Bureau is one of the largest government organizations dedicated to translation. They have spent the last few years modernizing their technology infrastructure and drawing up an AI strategy. Through a series of proofs of concepts as well as trial and error, a clear pathway to the future is shaping up ahead.
The Translation Bureau’s Innovation Director, Caroline-Soledad Mallette, recounts lessons learned, surveys the lay of the land, and outlines best practices in the search for providing an adaptative, best-fit solution for technology-augmented linguistic service.
Register here for AMTA-2022.
The Association for Machine Translation in the Americas (AMTA) was founded in 1991 as the chapter of the International Association for Machine Translation in the Americas, with sister associations in Asia (AAMT) and Europe (EAMT). Together, they represent the most important global community dedicated to machine translation. The association’s biennial conference, AMTA 2022, will take place in Orlando, Florida, on September 12-16.