Focus
IT context of human language translation
David Filip
David Filip is a researcher in next generation localization project and process management and an interoperability standardization expert. The underlying research was supported by Science Foundation Ireland as part of the ADAPT Centre at Trinity College Dublin. He is a current member of the MultiLingual editorial board.
David Filip
David Filip is a researcher in next generation localization project and process management and an interoperability standardization expert. The underlying research was supported by Science Foundation Ireland as part of the ADAPT Centre at Trinity College Dublin. He is a current member of the MultiLingual editorial board.
reland has been perceived since the 1980s as a global capital of industrial translation and localization. This happened largely because global multinationals such as Oracle, Microsoft, Google and Facebook were happy to headquarter their globalization and internationalization efforts (for Europe, the Middle East and Africa if not globally) in a friendly, English-speaking EU country, a trend becoming even stronger with Brexit.
Yet there is a far wider IT context in which we need to look at human languages and translation, well beyond Ireland or the EU. There’s a rich relationship between human natural languages and reasoning automation ideas that ultimately lead to the formation of computer science in the industrial era.
Human language has abstract semantics and among other things, it allows us humans to make logical inferences. Inferences, or the ability to relate thoughts and make conclusions, form the baseline for interpersonal communication. Perhaps somewhat surprisingly, it can be argued that all philosophical and logical developments that eventually led to the creation of computer science were initially founded in the systematic study of language. This happened from antiquity, through medieval scholastics, up through the modern industrial and post-industrial era, when the automation ideas of 17th century philosophers gradually became implementable.
Aristotle was the first man known to notice, in his Prior Analytics, that language-based reasoning relies on an abstract inner structure of thoughts. He founded the theory of quantifiers by recognizing statements made in general (about all members of a scope or universe) or in particular (about at least one member of a scope or universe). He noticed that pairs of input thoughts can be evaluated against any single supposed conclusion in a rule based or automated way (evaluating the abstract thought structure independent of the actual psychological, spoken or written instances). Inferences thus structured are called syllogisms, and from the modern point of view, these cover just a small fraction of possible reasoning structures. But even these can cover a lot more when recursively chained. Additionally, the stoics recognized the nature of logical connectives such as and, or and either/or. Again, as with Arsistotle, they glimpsed the abstract structure behind human language and laid the foundation of so-called propositional calculus.
These efforts were summarized in the 17th century, in unpublished Leibniz works on rational calculus, and resulted in the foundation of modern symbolic logic in the 19th and 20th centuries. Without symbolic logic and formal logical methods, the foundation of computer science by Alan Turing and company in the 1940s is simply unthinkable. Each and every modern programming or data modeling language currently in existence uses a subset of symbolic logic notions stemming from this tradition, such as conditional (if then), biconditional (if and only if), conjunction (all cases at the same time), disjunction (at least in one of the cases), negation (not having a property), predication of properties (attributes) to objects (subjects, individuals, members of the universe). Pretty much all programming is based on testing if an object has an attribute (property) and then doing something based on the outcome. The outcome is typically binary (very often recursively nested) and systems largely differ only in how they treat the undefined cases.
Language is a baseline characteristic of a human as a social animal. Symbolic representations of languages and the unlimited exchange of abstract ideas that this facilitates are what gives us the ultimate advantage over other intelligent mammals such as dolphins or apes. The fact that language encodes interpersonal abstract thoughts underlies not only the human ability to reason, but also the human ability to translate from one human natural language to another.
As corollary, there is a fundamental difference between a human translating and a computer translating. Even the most advanced neural machine translation (NMT) algorithms, running on the largest graphics processing unit clusters, are performing operations on language as instantiated with a specific syntax. Deep learning (DL) algorithms may glimpse functional dependencies between and among syntactic forms expressing thoughts in languages, so that they sometimes cater to semantic and pragmatic factors. But it’s chance, in the sense that we know the computer did not decode the abstract thought behind the syntax of a specific sentence, and did not consider if the intended thought might have been affected by semantic or pragmatic relationships. It simply calculated that there is a high chance a certain string of characters or sounds in a language means the same as another string or characters or sounds in another language.
The machine did not perform the leap to the semantic and pragmatic levels to reconstruct the source meaning in a certain pragmatic context in the target language. It merely used advanced and opaque statistical methods to perform complex syntactic operations on strings of characters, never ever leaving the syntactic level.
Human translators of course make mistakes, but the mistakes from human translators are fundamentally different. We can say that the machine produces both the correct and the incorrect translations by chance. The algorithm designer tries to make the chance of producing a correct translation as close to 100% as possible, and reduce the chance of producing a wrong translation to 0%. However, achieving these ultimate limits is impossible unless new semantic and pragmatic interferences are excluded.
Language study is indispensable in the latest technological developments. Human language is the preferred interface between humans and technology. Therefore, development of human language based inference, communication, and decision making capabilities in computers is a critical strain in current AI research.
Unfortunately, the current hype around DL and applications of neural networks in general lead to many misconceptions. One of the most dangerous misconceptions among nontechie decision makers is that — somehow — data exchange formats and interoperability standardization are becoming less important with the advent of AI. Nothing can be further from truth. In fact, strict formalism, standardization and modular additivity of algorithms are what makes it at all possible to have transparent AI and to keep humans in the loop; to augment human capabilities by AI rather than making people involuntary slaves of some opaque machine learning (ML) driven technology. After all, we have explained above that none of the current ML methods ever leaves the syntactic plane. So it is entirely absurd that such methods would make standardization of formats obsolete. Formats in general are defined by vocabulary (what thought is expressed by each primary component and what those primary components are) and grammar (the syntactic rules for using the vocabulary; among other things, how to construct more complex expressions from the primary components).
“the current hype around DL and applications of neural networks in general lead to many misconceptions”
It is important to repeat that human language has semantics, and moreover human language is acquired by individuals in a pragmatic context. Thus, human language can be reduced neither to pure syntactic rules nor to pure semantics; human language is always overloaded with pragmatics. This is why ML techniques can never crack it for good, because no matter how deep the neural network, results of a deep learning algorithm are always functional. Even if the “designer” of that system might well not be able to tell what the function is or what the function will end up being based on data that will be fed to the machine during its training. In various contexts, neural network based algorithms can have better results in sensing or decision making than a human has — and it is particularly easy to beat a human not trained in a decision-making task. This is because such a system has a bigger and faster capacity to absorb data. However, such a system doesn’t understand the data it was fed, it merely performs statistical calculations on the syntax and both the semantic and pragmatic levels remain out of bounds for any ML system, deep learning or not. This is the ultimate rationale for the human in the loop and for using ML methods to enhance human capabilities rather than to replace them. Perhaps the most famous case is the centaur chess. After being defeated in chess by Deep Blue, Garry Kasparov took to centaur chess. No unassisted chess computer can beat a human grand master assisted by a comparable AI. Similarly, Go game theory was greatly enriched by interaction between human players and the AlphaGo computer to beat the South Korean Grandmaster Lee Sedol. Fan Hui (2nd dan) who played AlphaGo before Lee Sedol (9th dan) admitted he became a far better Go player with better strategic foresight after playing AlphaGo; indeed, his worldwide ranking jumped from 600s to 300s.
Interestingly, AlphaGo training was a great example of using a technique called generative adversarial network (GAN). This technique is used to a great advantage in making sensing systems more robust facing ever improving fake input data. To explain very simply, AlphaGo acquired its initial amateur level by playing amateur online matches, then trained to master level by playing itself a zillion times. GAN hasn’t been heavily used in machine translation (MT) so far, but first attempts were published. The question is, if GAN can be as efficient in an open system problem such as human language translation, as opposed to a closed system with a clear win condition that Go is, or binary sensing problem (horse or not a horse, the authorized user of this computer or not).
To conclude this arc, NMT systems are still statistical systems, albeit statistical systems that are less explainable and require much more hardware to run. Even GAN systems are still statistical, although they can make great leaps toward very low error rates with limited data. Because those systems are statistical and not capable of semantic insight, it is important to standardize the syntax of their input to increase their chance to perform beyond human par. Finally, the standardized format is not only critical at input but even the AI needs a method to store and display its results, again nothing you could address without a format definition, and of course better standardized than proprietary if you want to exchange and display (render) the information.
For instance, Internationalization Tag Set (ITS) 2.0, the W3C internationalization and localization metadata standard Multilingual readers know from our biennial series of Localization Standards Readers, has been listed by JTC 1 Big Data Standards Roadmap as a key enabler for automated processing of human language within Big Data and AI architectures:
“The ITS 2.0 specification enhances the foundation [XML and HTML 5] to integrate automated processing of human language into core Web technologies and concepts that are designed to foster the automated creation and processing of multilingual Web content.”
Analytic quality evaluation in translations has been a long-term concern in our industry. I applaud Multidimensional Quality Metrics (MQM) in particular because it seems that it might have finally succeeded in explaining to industry stakeholders that there is no concept of quality without specifying expectations (requirements to be fulfilled in order to serve a specific purpose). It also makes clear that even MQM can only be applied if you subset it based on a short list of meaningful requirements. So far so good. However, it’s not enough to do analytic quality assessment. If you only do it, it won’t improve anything. The real incentive to perform quality assessment in an analytic way is to create lessons learnt and to improve quality through applying those in future iterations. This is something that current MQM implementations underestimate or outright ignore. MQM had been forked of and expanded from the ITS 2.0 data category Localization Quality Issue (LQI). ITS data categories information are designed to be injected into any native formats locally, inline, so that they make sense in context. MQM unfortunately broke its ties with ITS LQI and doesn’t have any other standardized mechanism on how to be recorded locally inline so that it immediately makes sense to language workers dealing with it. Good MQM data, inline in context could be a great boost for ML and DL methods, maybe also a great input for GAN techniques. Having MQM data in separate databases or even spreadsheets totally out of context, not so useful.
In later articles in this series, we will explain how a modern enterprise needs a standards based multilingual content strategy and will make some deep dives into particular areas of needed standardization in the context of organizations’ localization maturity.