When humans translate a given text, they focus on translating small chunks of text, or phrases, one after the other and then link them together to form a complete translation of the source text. Choosing the most appropriate translation of a given phrase is based on the meaning of the phrase itself and the meaning of one or more of the immediately preceding phrases, as well as the context. But the specifics of the target language also need to be considered in order for a translation to be both adequate and fluent.
Current machine translation (MT) tools have been tackling the problem using a similar approach. Phrase-based statistical machine translation (SMT) — the most commonly used MT paradigm — uses tables of tuples of a source phrase and its translation. Each tuple is, furthermore, associated with a probability that indicates likelihood of a specific translation for the given source phrase. Given a large enough set of sentences from the source language together with their translation in the target language, it is relatively easy to build a phrase table that can then be used for translation. Only a phrase table, though, is not enough. Due to the fact that source and target languages are usually quite different with respect to sentence (or text) construction, it is required that the phrase table is built to reflect the target language specifics. For example, a word-reordering model is used to determine a word order of the translation that will mirror as closely as possible the meaning of the source. Another challenge is subject-verb agreement — determining the correct verb forms with respect to the subject in a sentence. For these and many other issues, e.g. system expansion, system evaluation, tuning and so on, numerous solutions have been proposed in the recent years of research and development of MT. In addition, due to the recent globalization tendencies and the drastic expansion of the world wide web, bilingual data for major language pairs can be found to build quite elaborate SMT systems.
Despite the advances in current SMT systems, their translation quality cannot be compared to human translation, and human post processing is often required. Humans use their knowledge about source and target languages, their experience, their intuition. Humans possess the necessary intellectual plasticity in order to produce a translation of high quality. We adapt translations to suit the required style, to be coherent with the particular domain and to be accessible to the intended audience.
But SMT is fast. While humans translate on average 1,000 to 4,000 words per day, a machine can translate 4,000 words in a minute. Furthermore, building an SMT system can take between a couple of hours to a couple of days. For humans, learning a foreign language is a hard task that involves memorization of vocabulary and associating the foreign vocabulary with already known words, phrases or concepts; understanding of grammar rules and exceptions; and adoption of style and communication peculiarities. Moreover, it becomes even harder to learn languages that differ significantly from the base language — the mother tongue.
One way to incorporate intelligence into current MT systems is by using human post processing. The information that editors introduce when correcting MT-generated text is often related to details that machines cannot anticipate, but are significant to the adequacy and fluency of the translation. For example, consider a translation of cooking recipes from British English into Dutch. In the bilingual training data, information about liquid measurements is in fl oz (fluid ounces). In the target part of the bilingual data, the same information is in ml (milliliters), so the numbers are converted as well. Often SMT systems do not handle such cases correctly, but humans understand these types of errors and may easily correct them.
Research into these types of errors is critical for advancements in the quality and user experience of MT, so tools that aim to improve the experience of editors in reviewing translations and quality evaluation will become more important. Editors can assess translation quality and edit translated text segment by segment, thus improving the overall translation quality. The next step in improving MT is to use a form of supervised or semi-supervised machine learning techniques to reintegrate the corrected segments for the system to self-educate and, therefore, improve itself. Incremental retraining builds a self-improving MT framework and is a technique used to adapt the phrase table of a phrase-based SMT on the fly without the necessity to rebuild the system from scratch.
A form of incremental retraining is used already in a tool called Lilt. Lilt uses input from customers to adapt to their style and language, and therefore, personalize further translations based on this input.
When it comes to new and revolutionary technologies in MT, a significant place is occupied by Neural MT — a novel paradigm for MT based on deep learning. What makes Neural MT fascinating is the fact that it exploits artificial neural networks (ANNs). ANNs were introduced in the late 1950s and are aimed at solving pattern recognition tasks. ANNs are networks of single computational units, or neurons. Each neuron performs a very simple computational task, such as summation, on its inputs and produces output that is used as input in the next neuron. ANNs attempt to simulate how the human brain learns and thinks, and for many tasks, such as classification, clustering and prediction, they have shown amazing results. Tools such as AlexNet and the Wolfram Language Image Identification Project have astonishing image recognition capabilities thanks to these neural networks; Microsoft’s Cortana and Apple’s Siri excel in both understanding and generating speech. The way neural networks learn is quite similar to biological neural networks. That is, feeding training data forces the weights of the neurons’ input/output connections (i.e. synapses) to adapt in such a way that the output of the neural network is as correct as possible, where correctness is judged according to the intended task.
Neural MT (NMT) employs recurrent neural networks (RNNs). RNN models can be built by using various freely available tools for deep learning, such as Caffe, Theano or TensorFlow. Similar to SMT, NMT requires large amounts of bilingual data to train the neural networks. The trained network can then translate sentences from a given source language to sentences in a target language.
Neural MT comes a step closer to reducing the gap between the human thought process and MT by using neural networks and computer-based translation. It is still a young field and therefore we can expect the upcoming advances to offer great improvements in MT quality.
A quite different approach to simulating biological neural networks has been undertaken by the OpenWorm project. In an October 2015 TED talk, Stephen Larson talked about his company’s ongoing work in building a living organism inside a computer. In particular, it is a small worm called C. elegans. This worm’s size and body properties have allowed scientists to completely map its neural network, study its properties and design computer models. In the OpenWorm project, computer models have been taken to the next level and a complete organism has been created from computer code. This organism has the sensory and motor functions typical for its biological prototype. While the idea of simulating the human brain in its full complexity is far beyond imagination, the success of the OpenWorm and the foundations it has laid raises the question as to whether a system that can learn a language the way humans do could be built. Furthermore, it raises the question: could this be a better approach to tackling the MT problem? Practice has shown that combining different types of technologies is often more beneficial than focusing on improving one.
For humans, the process of learning a foreign language can be summarized by three main subprocesses: vocabulary memorization, grammar comprehension and style adoption. For a computer system, the first subprocess can be seen as storing tuples of source and target words in conventional types of memory (magnetic or solid-state drives). Storing relational data is a trivial task for computers that are write/read efficient, durable and deterministic — it is highly unlikely that the read data will be different from what was stored, and redundant array of independent disks (RAID) systems are widely used to ensure memory reliability. For computer systems, decision making is based on their binary nature — either something is on or it is off; using if-then-else control, we can encode in a computer routine a wide range of rules, like, for example, the grammatical or communication rules of a natural language. Thus, we implement the second and third subprocesses of language learning.
While the human brain cannot “store” memories with the capacity, accuracy and durability of a computer system, when it comes to language and translation, it can reason in a way that is far superior to any conventional computer system. So using an approach similar to the one undertaken by the OpenWorm project or the Neural MT approach in combination with a conventional computer system to store and retrieve data may be the solution for an MT system that can solve the problem of language. Such a system should have, first, a memory unit to store and retrieve data built according to conventional computer systems, and second, a reasoning unit built with advanced artificial neural networks.
The challenge now is to determine a suitable model of a reasoning mechanism for languages. Implementation is easy, as history has shown.