On the heels of its recently released multilingual machine translation (MT) model, SeamlessM4T, Meta is adding three models — SeamlessStreaming, SeamlessExpressive, and simply Seamless — to its MT toolkit. Built on version 2 of SeamlessM4T, the new open-source models boast state-of-the-art speech recognition and translation capabilities in up to 100 languages. Taken together, the four models comprise the Seamless Communication suite.
“Building upon our foundational multilingual model, SeamlessM4T, we’re making progress towards a future where real-time translation across languages is possible,” Meta Research Engineer Anna Sun says in a company video. According to Meta’s AI research team, the system enables fast, accurate, and expression-preserving speech translation, making it “a significant step towards removing language barriers.”
Each new AI model targets a different aspect of translation. SeamlessStreaming is all about speed, delivering translations “in just under two seconds of latency, which is comparable to the average latency of simultaneous human interpreters,” according to Sun. In a recent blog post, Meta researchers explain, “In contrast to conventional systems which translate when the speaker has finished their sentence, SeamlessStreaming translates while the speaker is still talking.”
Meanwhile, SeamlessExpressive looks to preserve the speaker’s vocal style and emotional tone so that the translated speech doesn’t sound robotic. “It’s not just the words we choose that convey what we want to say — it’s also how we speak them,” the blog post continues. “Tone of voice, pauses, and emphasis carry important signals that help us communicate emotions and intent.”
The all-in-one Seamless model combines capabilities from SeamlessStreaming and SeamlessExpressive. An innovative “watermarking” feature identifies the audio output as AI-generated, which “helps promote the responsible use of voice preservation technology [and] prevent potential abuses,” the researchers write in the blog.
Meta researchers conclude that the system brings us closer to a universal translator. In a publicly available research paper, they write, “Seamless gives us a pivotal look at the technical foundation needed to turn the Universal Speech Translator from a science fiction concept into a real-world technology.”