When I think of Michel de Nostredame (Nostradamus), the French philosopher and seer, I picture a delusional character who lived in the 16th century, writing with a feather in a candle-lit room. Every time I hear about the many disturbing predictions he made, I get the chills. Thankfully, my more practical side quickly takes over and I tell myself, “forget about it, this is not real.”
His real prophecies are compelling, but there may also be “prophecies” attributed to him that he didn’t actually make. According to the uncited internet, Nostradamus wrote, “After the invention of a new engine, the world will be like in the days before Babel,” a setting and period when the world spoke one language, as told by the Book of Genesis.
Is there an engine capable of creating a world language that all can understand? The answer is yes. What Nostradamus couldn’t know is that more than one type of engine would eventually emerge from all the work and research done within the fields of computational linguistics and localization.
The internet has given rise to a proliferation of communications that occur each day in every corner of the globe, carried out by people engaged in conversations representing the majority of the world’s approximate 6,700 living languages. Each passing day, volumes of additional translated content becomes available, providing for the creation of more dictionaries for use in rule-based machine translation engines, as well as translated data that both statistical machine translation and hybrid engines can learn from. As a result, we’ve seen huge improvements in many of the more common language direction pairs, such as English to French and Chinese from English.
Going forward, the biggest challenge to achieving a comprehensive single language reality will be the work necessary to create engines for less common directional language pairs, such as Hindi to Spanish or Chinese from Portuguese. Even at the accelerated pace that’s been achieved more recently, it’s likely to take a long time to create engines capable of producing
quality directional pairs for less common languages.
Additional challenges stem from many of the practical issues that have long plagued the localization industry. Just as with human translators, machine translation (MT) engines will continue to struggle with ambiguous words; accurately translating literature; recognizing and adapting for constantly evolving cultural language nuances; properly conveying true meaning and emotion versus providing simple translation for words and sentences; and much more. Some things are, and will remain, untranslatable.
But it is only a matter of time before we have enough human translated data to create engines in different topics for the less common language pairs. Creating multiple directions and multiple topics, this could take another 20 years if we continue using the same army of computational scientists, translators and developers working on these robotic language technologies.
The more you know your enemy, the more prepared you are to fight it. Although Nostradamus’ prediction used to give me the chills, I’ve come to understand that these engines are fed with real human translations. And they continue to learn as people keep using the language. I no longer view MT engines as a threat to the profession. I now believe they will transform the industry and create new opportunities. The use of computers for linguistic research and applications is a discipline that I highly encourage students to explore. For those who are passionate about engineering and languages, including high school students, or those with existing degrees in English or other languages, this is a career path with few competitors.