RESEARCH

Not Uyoga, Not Translation
A first look at SeamlessM4T in 14 languages

Martin Benjamin

As the hottest summer ever experienced by homo sapiens approached its end at the Meta headquarters in California, the world’s ninth-largest corporation announced a new achievement for cyber sapiens: the introduction of SeamlessM4T — “the first all-in-one multimodal and multilingual AI translation model that allows people to communicate effortlessly through speech and text across different languages.” The technology, they say, supports translation among nearly 100 languages and builds “a universal language translator,” with the firm implication that they have accomplished this feat for the many languages in their quiver. According to a Meta blog:

“For these tasks and languages, SeamlessM4T achieves state-of-the-art results for nearly 100 languages and multitask support across automatic speech recognition, speech-to-text, speech-to-speech, text-to-speech, and text-to-text translation — all in a single model. We also significantly improve performance for low- and mid-resource languages supported and maintain strong performance on high-resource languages.”

Seamless translation among almost 100 languages is a bold claim, which is not backed up by any cited research. Does the announcement hold water from a scientific perspective, or did the publicity get ahead of its skis? With a market cap of over three-quarters of a trillion dollars, Meta chose not to share reviewable results for a translation system that they presumably intend will lead to future profits. This is marketing, a shot across the bow of Google Translate and Microsoft’s Bing, with the unsubstantiated sotto voce comment, “SeamlessM4T also outperforms previous state-of-the-art competitors.”

Having previously evaluated Google Translate across most of the languages in its grid, with the results online, I was not content to take Meta’s intimation of translation Nirvana at Facebook value. With neither budget nor spare time to sink into the task, I ran the simplest possible test: searching re-branded Twitter for a multiword expression that my testing shows consistently confounds its competitors — “the spring in her step.” I found a tweet in well-formed everyday English, had Seamless translate that text into three languages (French, Romanian, and Swahili) that could be evaluated by fluent speakers in house on a “does the translation convey the meaning of the original” basis and spent a good many hours into making a video to share the results of the experiment.

The executive summary of the preliminary experiment was, from English to other languages, Seamless seems to produce equivalents that one could use in low-stakes situations. Think consequences of blown expressions that are limited to giggles or confusion. As with Google or DeepL, that is, Seamless translations could likely get you through a conversation with the neighbor on your next flight, but should under no circumstances be relied on in situations where it matters, such as medicine, law, or business.

That said, I wanted to build on my preliminary research with a comparative evaluation across 14 languages. The research is not intended as a complete analysis of Seamless across all language pairs, which would be an extremely difficult, expensive, and time-consuming endeavor. Rather, it follows the principle that a representative sample is indicative of the performance of the whole. Seamless offers 36 languages in its public demo. Analyzing whether translations from one of those languages to 14 of the others were understandable by human readers is a fair test of the system; we can posit that the quality will be roughly equivalent in other configurations, while also allowing that some cases might be substantially more intelligible and other cases substantially less so. Below I provide a full explication of my methodology so that similar investigations can be performed with other languages, or for other researchers to critique the methodology and revisit the evaluation task with a different procedure.

I should highlight three findings at the outset:

  1. The speech-to-text engine showed consistent excellence in transcribing the spoken word. This was not the point of the study and was not subject to rigorous testing. However, I would like to flag this observationally as a substantial achievement that will enhance NLP and language technology services for many languages going forward.
  2. In all cases, there were serious flaws in the translation that rendered them useless as actual information. Words were rendered. Meaning was not.
  3. Somewhat by accident, researching this article led to the discovery that a substantial part of the source material for LLMs, such as those used by SeamlessM4T for languages other than English, is computer-generated web text, usually via Google Translate. For example:
  • The first language in Seamless’s offering is Bengali
  • The entire website for the Chicago O’Hare International Airport is offered in automatic translation via Google’s website translation widget
  • All of that content is part of the galaxy that is harvested by robots crawling the web
    An untold number of other websites embed the same commercial service from Google Translate
  • This string of text from the “Bengali” section on accessibility is definitely not Bengali as produced or vetted by a human: বিমানবন্দর জুড়ে অবস্থিত সমস্ত বিশ্রামাগারে হুইলচেয়ার অ্যাক্সেসযোগ্য সুবিধা রয়েছে। এছাড. ়াও, সমস্ত টার্মিনালে সমস্ত জেন্ডার বিশ্রামাগার রয়েছে. It can, however, be interpreted as a message that restrooms located throughout an airport have wheelchair accessible facilities and terminals have all-gender restrooms.

Anyone can install the widget on their website — Google can easily find for tutorials and videos showing how to “translate” your content to over 100 languages in a few simple steps. The pollution of the global linguistic data seas might be the most important revelation of this article, but was not put to any systematic testing herein. Therefore, I introduce the initial evidence below and invite you to join me in tugging the thread for further investigation.

Experimental considerations in multilingual evaluation

Extensive multilingual testing of a system like Seamless is a difficult undertaking. From English to other languages, it is easy enough to generate parallel translations from an identical starting text, but even this runs into innumerable problems. What is a “fair” text — IMs or Wikipedia or Shakespeare? In the case of Seamless, test translations are limited to 15 seconds of spoken words, rendered in three languages at a time, with frequent system errors that require re- and re-re- and re-re-re-recording, before being copy-pasted into emails and direct messages and spreadsheets. We can talk about gold standard metrics until you’re BLEU in the face. Five languages, or 50, or aim for the whole basket? What resources do you have to chase down competent evaluators for even major languages like Bengali, much less the previously ignored languages that Meta is admirably advancing into the technological universe for the first time, like North Azerbaijani? Each evaluator must be competent in both English and the test language, be accessible and responsive, understand the task in the same way, and, unless you have some way of paying small bounties in dozens of currencies in dozens of countries, be willing to volunteer their time.

Reversing the language direction greatly increases the complexity of the evaluation problem — double the pairs, but an order of magnitude more work. I’m pretty sure nobody on the planet can assess whether a text chosen to assess North Azerbaijani to English has equivalent difficulty as one chosen to assess Bengali to English, and that those are equivalent to one chosen to assess Croatian to English.

Now take English out of the equation, because that is the claim Meta is making — essentially seamless translation among 100 languages. This increases the challenge exponentially. Find a text in North Azerbaijani that SeamlessM4T can translate to Bengali, and a method to ascertain whether or not the translated text conveys the same meaning as the original. With a budget from Facebook, you could do this for each language pair, in both directions, or at least a randomly selected sample. Or, you could throw up your hands and say, “Meta says, ‘AI something something strong performance something something 443,000 hours of speech something something text-to-unit model something something.’ Oh hell, it must be as good as they say.”

Or, we can run some tests that probe around the edges and answer the question of whether the system is producing seamless translations, more than anecdotally but short of comprehensively. For those who object that my testing regimen does not have the rigor of a full test of Seamless for all translation scenarios, I propose that it should not be the role of an independent researcher to disprove the claims of a corporate press release. When the corporation makes claims about its product, it should be accountable for providing peer-reviewable proof of its claims before given any credence. For disproof of inflated claims, there is a lower threshold — once a few balloons are shown to pop when poked, that is enough, without poking every balloon that the corporation has floated. When an experimental drug is shown to cause debilitating side effects in some portion of participants in a clinical trial, the plug is pulled on the trial, without waiting for evidence that the drug might be safe and effective for certain use cases among a certain subset of the population. The point of peer review is not for the peer to validate the research claims, so much as to assess whether the claims could be valid, or are invalid prima facie. Without being able to definitively inspect Meta’s truth claim of seamless translation from North Azerbaijani to Croatian, my tests address the final standard.

Poisonous mushrooms and translation

Please entertain a small digression. One evening in 1995, I went camping with a British forester in Miombo woodland on a beautiful highland escarpment. For a couple of weeks every year, a delicious mushroom the size of a plate bursts forth in the forest, and when grilled, is as sumptuous as carnivores find a steak. The Swahili word for mushroom is “uyoga,” we both knew. After we’d set up our tent, but before dark set in, we went for a stroll and came across a mushroom as big as a plate, that looked like the ones we’d been enjoying from the market. Was it edible? Was it hallucinogenic? Would it kill us? As we debated the risk of bringing it back to camp, a woman happened by from the nearby village. “Don’t eat that!” she cried. “It’s not uyoga!” At that moment, the forester and I learned that Swahili has at least two categories, where English has one — uyoga, which are edible mushrooms, and things other than uyoga, which I still don’t know how to express. Words matter — without that kind grandmother’s warning, we could have ended up with Timothy Leary on a trip to outer space we had no wish to take.

Nearly three decades later, a headline in Fortune reads, “Mycologists [fungi scientists] warn of ‘life or death’ consequences as foraging guides written with AI chatbots crop up on Amazon.” The article is behind a paywall, but the title makes it clear that even for English, trained on billions of tokens extracted from English text, the maximal scenario, the cousins of SeamlessM4T will direct you unguardedly toward things that look like edible mushrooms but can in fact kill you. Seamless has a lot less training data for even the best-furnished languages of wealthy nations and much less for languages like Swahili. To scale outside of English, Meta says, “We build upon our pioneering work on text-to-text mining using a similarity measure in a joint embedding space … Mining is performed in data from publicly available repositories of web data (tens of billions of sentences).”

Mining is performed in data from publicly available repositories of web data.

To understand the implications of this statement, we must journey away from the mountain woodlands of southern Tanzania, and go spelunking in the seams of SeamlessM4T’s Swahili mines. Enter this prompt in your Google search bar: uyoga hatari sumu. (“Hatari” is danger, and also the title of a 1962 John Wayne movie about white people and big game in Africa that was filmed in Tanzania; “sumu” is poison.) Examine the first five results. You are looking at a scandal.

Of the first five results for a search about poisonous mushrooms in Swahili, four are the output of Google Translate or Bing. The fourth article on the list is a genuine story from a writer for the BBC Swahili service about a young Afghan refugee in Poland who received a liver transplant after eating a poisonous mushroom. The other four are articles that purport to present factual information about recognizing and avoiding poisonous mushrooms. I located the original Spanish text that was the source for the “Swahili” article — written by someone named Germán Portillo, who was rendered on the Swahili side as Portillo of Germany. Rains that occurred in the springtime on the Spanish side (the rains in Spain fall mainly in “primavera”) occurred in a mineral spring (“chemchemi”) in translation. Pasting the first Spanish paragraph into Google Translate results in a Swahili conversion that is nearly identical to the article on the web. The other three articles are also all plainly the result of machine translation.

These articles constitute the publicly available web repository for Swahili. Swahili mining is performed in data from publicly available repositories of web data. In other words, four out of five of the top “Swahili” texts about poisonous mushrooms on the web, one arbitrary topic, are the unintelligible output of machine translation. I have not done the research to say that 80% of Swahili on the internet is pure hokum, but I can say that a lot of websites use Google Translate to publish converted text to over 100 languages, and that the hokum from those websites then appears in Google Search as “authentic text” without being labeled as MT. Meta has now mined those websites as the source for their LLMs. This detour may be the most important finding of this article — for what it says about the sources for the current generation of AI and into the future as more and more content is AI generated. Like the fake mushroom foraging guides sprouting up around the web, the finding is certainly the most deeply concerning.

Translation is not a mathematical game of how closely the words or characters in a given n-gram can match a predicted gold standard. Translation is achieved when the meaning of words in one language are understood by someone in another language. The boundary line between translation and not translation is sometimes subjective, as the US Supreme Court justice Potter Stewart once said about judging whether an image is pornography — “I know it when I see it” — but it is often the difference between uyoga and not uyoga. Either it’s translation, or it’s word salad. Either it’s uyoga, or you die.

Experimental parameters

With that in mind, I devised a small experiment that could show whether SeamlessM4T is producing translation, beyond the impressionistic but not attempting to be numerical. The basic idea is to run a test across non-English pairs that can conclusively demonstrate whether or not the system is producing something that can be understood on the other side. Without doing the intensive comparative research that Meta chooses not to finance, we can nevertheless see whether patterns start to emerge. Uyoga or not uyoga.

The Seamless demo prompts guests to jump in by saying, “Need some ideas? Try saying, ‘My favorite animal is the elephant.’” On the basis of that suggestion, I visited the Swahili Wikipedia page for elephant and chose two sentences that I could comfortably read in the 15 seconds that Seamless allows.

Tembo wakubwa hawana wanyama wanaowawinda ingawa simba huweza kuwashambulia ndama wa tembo na tembo wadhaifu. Japokuwa hutishiwa sana na kuwa hatarini kutokana na mwingiliano na binadamu na ujangili.

I then consulted a Kenyan alumnus of Kamusi Labs, who has conducted graduate studies in English, to land on what an L1-Swahili/ advanced L2-English speaker and an L1-English/ advanced L2-Swahili speaker together consider to be a well-rendered translation.

Adult elephants have no natural predators, although lions may attack elephant calves or weak elephants. Nevertheless, they are highly endangered and threatened because of conflict with humans as well as poaching.

The Seamless demo does not currently accept written text, so I had to read the selection into my computer microphone. Here, I will sing all praises to the software. It always recognized the language as Swahili, and produced near-perfect transcriptions every time — regularly stumbling only on “japokuwa” no matter how carefully I enunciated, signaling an out-of-vocabulary data issue. We can debate whether my foreigner’s Swahili is the best to evaluate their speech recognition (someone who heard me on the BBC Swahili service once described my speech as “bookish,” while someone on the coast who first heard me talking on the other side of a wall asked if her family was being visited by someone from the inland region where I had in fact been living), but hats off to the Meta engineers and the Swahili team on a product that performs brilliantly even when the Swahili pronunciation it receives is less than brilliant. To be clear, speech recognition is the work of mechanical processing, not artificial intelligence. Because their speech recognition is stellar, it adds minimal noise to their translations.

The AI component is the translation. In this, for non-English pairs, Seamless is usually rubbing together what it knows about two languages for which it has no parallel text, though it is more than possible that direct training material exists for certain major pairs. German to French data could be in the mix; Swahili to Hindi, certainly not. (I am not prepared to say anything about the role of English in this transaction on Seamless. For Google Translate, Bing, and DeepL, please read “How GT pivots through English.”) I ran the translation for 13 languages, rather arbitrarily selected by those I could contact quickly across my networks, as well as English. For French, Romanian, and Japanese, the translators were L1 native speakers in both languages, which can be considered the platinum standard. (For the first two, the victim was my 13-year-old daughter, who has grown up perfectly trilingual with a US-PhD’d Romanian mother, French immersion in daycare and school and among friends starting at 13 months, and the notion that fun time with her dad should often involve tests of her proficiency with words like “idiosyncratic” for US college entrance exams; though she lacks domain vocabulary for technical translations, her intuitive translations among her languages are at least on par with most professional simultaneous translations I’ve heard through headphones at international conferences). For the rest, two were Americans who had lived in the Netherlands and Germany for decades, married local men, and worked professionally as translators, and for German, we add an Ivy League PhD in linguistics. The rest were native speakers of their test language who had university degrees that required advanced skills in English, running the gamut from astrophysics prof to UN brass. Polish was translated by a senior professor who is also conversant in Swahili, and the Danish by a development professional who also knows Swahili, but neither was shown the original until after they’d submitted their translations. To most of those people, I wrote the following:

I’m doing a little test of the new AI translation software from Meta/ Facebook. Would you have a moment to translate this output into English? The original text was in Swahili. Please don’t overthink the task — translate what it says, not what you think it is attempting to say.

I then pasted the output from Seamless. When their replies came in, I copied them to kamu.si/seamless-elephant-translations, along with any relevant comments. I won’t go through each translation word by word, but you are invited to study the spreadsheet at your leisure.

Experimental results

The first point of entry should be the Seamless translation to English. This is because all readers of this article inherently have the English skills to evaluate the human and machine translations themselves and because there will certainly be no pivot to water down the conversion. Also, you can test this translation yourself with Bing, which makes only one minor grammatical error, and Google Translate, which is muddled but has all the right words in some of the right places. (DeepL does not attempt Swahili.) From Seamless, though:

Large elephants do not have animals to hunt, although lions can attack elephants and weak elephants, although they are highly threatened and endangered by interaction with humans and insects.

Let’s visualize this in segments:

These segments appear as splitting points as well in the other tested languages. Let me note that SeamlessM4T’s choice of “interaction” instead of “conflict” in the fourth segment is perfectly acceptable, while “insects” instead of “poaching” in the fifth segment is not.

The AI consistently locates elephants somewhere in the translation, while not once finding elephant calves. Whether elephants are the hunter or the hunted varies across languages, or perhaps the actors are ambiguous. For example, the Dutch elephants have no animals they can hunt, while in Ukrainian, “children of animals hunt them” with no mention of which animals and which “them.” In German, the Seamless translation could equally be “big elephants have no animals that hunt them” or “big elephants have no animals that they hunt;” the translator says that the latter would probably be the initial interpretation, until the listener used context and their knowledge of the world to deduce that lions are more likely to be prey than predator in this situation. Granted, the Swahili verb “wanaowawinda” has a lot going on in front of the root “winda” (to hunt), but Kamusi Labs coded a parser in the mid-2000s that follows every rule in the language to readily handle all ~18,000,000 combinatorial possibilities of every Swahili verb (currently offline because money). Wa-na-o-wa: Those are each discernable and meaningful parts that do not need to be estimated artificially. Mark Zuckerberg could have just called to learn how to learn who’s zoomin’ who in the savannah. Or he could end up with the Seamless Romanian result, “You elephant, big [using the plural form for ‘big’], don’t have animals that hunt.”

It would be difficult to continue a segment-by-segment analysis without sounding snarky and condescending, which is not my intent. For example, our man in Berlin states, “The German unambiguously says elephants are dangerous and threatening, not endangered and threatened,” but Kamusi Labs has unambiguously failed to find funding to upload the large German dataset we have waiting in the wings, so I’m not really one to snipe at the inadequate treatment of German. I should note with curiosity that none of the translations conquer the idea of “poaching,” although the sw.wikipedia article has a direct hyperlink to the Swahili article for ujangili, which in turn links to articles for the same concept in 54 other languages, which one would expect to be manna from heaven for training the large language model of a multilingual AI. I suspect, without evidence, that the appearance of “jungle” in more than one language in the final segment has to do with the surface similarity between “ujangili” and “jungle,” though it could be that jungles appear frequently in the same vector space as African fauna. As to the rest, it is perhaps best to observe that each language has some part uyoga and some part not uyoga, and leave it at that.

It might also be noteworthy that the platinum native bilinguals were the most resistant to this experiment. Instead of reporting the words that were printed, the Japanese/ Canadian translator first chose to retain her dignity, radioing, “The translation is no good, to be honest, which I don’t know what to translate to in English,” before eventually playing along a few days later. I watched my daughter attempt the Romanian in real time, concluding when she reached the final segments, “No, I’m sorry, I can’t. I can’t even make it into a sentence;” this is a person who, at nine years old, explained that something fell off a dinghy because it was “disequilibrated,” and then reproduced that thought with instantaneous, equal eloquence in her other languages. I can’t speculate too deeply about why the other participants suffered through the task even when they found it nonsensical. For example, an Iranian affiliated with the Persian Academy of Language and Literature, the regulatory body for the Persian language, wrote, “The text is not meaningful in Persian,” but sent back words to work with nonetheless.

Discussion of Meta’s truth claims

On the other hand, I have a thing or two to say about why Seamless pushed forward even when it had no clue.

For starters, as with its competitors, Seamless adheres to MUSA, the Make Up Stuff Algorithm. For example, the word for calf, “ndama,” is clearly out of vocabulary for Seamless, but the program either neglects to indicate that it is skipping a word in some languages, puts forward “ndama” as a legitimate word in other languages, and offers the nonexistent “nado” for Spanish, and something that translates as “tribes” for Ukrainian. In Turkish, there is the invention of “fil damlasi,” about which our translator, another Ivy League PhD, says, “Damla means drop and sounds like a drop of elephant.” Google and Bing are just as happy to lie to you when they don’t know something (for example, they both translate conjugated Spanish “hablo” to Italian as the infinitive “parlare,” rather than the correct conjugation “parlo,” a problem that is xsolved using a knowledge graph based largely on human intelligence, still with a few bugs in the beta, for about 4 million conjugations among five languages at c2c.kamusi.org). DeepL does exactly the same (in the case of “hablar”, getting “hablo” right in Italian for one of the 10 valid matches identified by C2C, but making up something for “hablas”) despite its unsupported claims that it is “the world’s most accurate and nuanced machine translation;”

Second is a disconnect between what computer scientists mean by translation, and what linguists mean. For computer scientists, translation is a game of horseshoes — you thrill if you get a ringer, but you still get points for getting close. For linguists, the game is more like bowling. Either you knock the pins over, or you own your misses; if it’s not uyoga, you acknowledge that it’s not uyoga, and refine your hunt for uyoga for your next trip to the woods. For a computer scientist, the output, “I will be available tomorrow,” is by all metrics a victory in terms of the number and arrangement of characters and words, for a sentence that a human translator produces as, “I will not be available tomorrow.”

Finally, we have the siren call of AI. We have all seen some remarkable output from ChatGPT for English. We’ve also all seen output that is banal or outright wrong, but the good stuff makes us over-tolerant to AI’s many spottable misses. Beyond making up stuff, Google’s Bard exhibits willful deception — for example, taking umbrage at being asked for pictures of Jewish and Muslim doppelgangers by stating, “There is no scientific evidence to support the existence of doppelgangers,” and putting forward, “The idea that Jews and Muslims look alike is a harmful stereotype that should be challenged,” after gladly proposing doppelgangers of a certain teenager’s favorite pop icon, Harry Styles. The collective desire to believe that AI is more right than not is amplified when it comes to languages that are foreign to oneself, because, unlike the existence of doppelgangers, we usually have no basis for evaluating the truth claims of a company like Meta about the quality of output we cannot read. I personally cannot tell you whether the Danish or the Hindi translations in this experiment come close to uyoga, but I can agree that the translations have a lot of Danish or Hindi words; they look a lot like uyoga. Moreover, Facebook has all the resources in the world to put into its well-reputed AI program, and the PR personnel to write copy convincing us that Seamless output is safe for human consumption, so who are you to doubt that you’re getting uyoga? AI is where it’s at and where it’s going. Other approaches to translation, such as those based on confirmed rules or data, are just so yesterday. For example, a survey for a Europe-wide EdTech alliance had the question, “Are you currently offering AI as part of your service or product?” Radio buttons offered a few ways that AI might be in use, or this single negative response, “🔘 Not yet.” You can either go AI, or go home.

Conclusions

The research above leads to two take-aways.

The first is the almost accidental finding that the linguistic source material used to create LLMs for Seamless and many other AI endeavors, the World Wide Web, has become absolutely corrupted by auto-generated content from M“T” (machine “translation”). In a minor quest to learn how human authors refer to “poisonous mushrooms” in contemporary Swahili articles published on the internet, I found that a preponderance of the leading Google results had been produced by Google Translate. Further investigation leads to the conclusion that auto-generation has significantly polluted the waters for most languages, with proportionally more damage to languages that have less original human-generated content online. The problem exists on an oceanic scale — this is a problem for NLP that can only be compared to the Great Pacific Garbage Patch, where our plastic detritus swells unchecked. Absent an emergency effort to find filters that detect real human content, it might be possible to determine tell-tale signs that something was written by an actual person if it were done at the scale currently given to sentiment detection within the NLP community. But I have not heard of any efforts to do so — the situation will only get worse. You can spend the rest of your day watching videos that show you how to use AI to generate spam that can drive traffic to your site and have a thousand pages online for indexing in “Swahili” or “Bengali” before you go to sleep. The pollution of the data seas for LLMs and AI was not the subject of this article’s primary research, so I cannot quantify it in any meaningful way at this point, but I must flag it as an urgent concern that extends to every aspect of AI that depends on reliable linguistic data.

The second conclusion set is more localized to one bay along the MT shoreline, though its implications extend beyond the immediate translation system that was tested. Meta has released SeamlessM4T, wrapped in the shiny packaging of using AI to achieve translation among nearly 100 languages. This paper has put that claim through a basic test, using SeamlessM4T to translate a source text from Wikipedia to 14 other languages (40% of those available for demo testing), and then enabling meaningful comparisons by engaging humans to translate those translations to English. The results are available for inspection at kamu.si/seamless-elephant-translations, with the SeamlessM4T output in Column B and the human translation of that output in Column C. SeamlessM4T picked up many of the words and some of the meaning; in most cases it would have been possible for the translators to guess the Wikipedia page discussing the main topic (elephants), but in no case could one have confidence that the Seamless M4T translation conveyed the meaning of the original text. In all cases, there were serious flaws in the translation that rendered them useless as actual information. We can therefore state that SeamlessM4T is an interesting advance in translation technology among languages that have not previously been directly paired, but it does not produce actual translation among languages that have not previously been directly paired. Whether AI has the potential to produce viable translation in a zero-shot environment is an interesting topic for future discussion, but was not the subject of this research.

Martin Benjamin is the founder and director of Kamusi, an NGO dedicated to gathering linguistic data and setting that data to work within language technologies, with a major goal to include languages that are otherwise neglected in research and trade.

RELATED ARTICLES

WEEKLY DIGEST

Subscribe to stay updated between magazine issues.

MultiLingual Media LLC