Beyond Text: Translated’s Multimodal Tech

Every great interview makes the future feel close enough to touch. That’s the vibe in this episode of Localization Today, as Director of AI Sébastien Bratières (Translated) explains why language tech must move beyond text—into voice, video, handwriting, and even the physical world. Recorded on a bus in Rome during the July 3 kickoff of a new EU effort, Bratières lays out how Translated and a consortium of top labs are building the next generation of “physical AI.”

👉 Don’t just read about it—listen to Beyond Text: Translated’s Multimodal Tech right here.

 

Why language needs the world

Bratières introduces DVPS — Diversibus Vis Plurima Solvo, a four-year project to develop multimodal foundation models with partners including EPFL, Oxford, and ETH Zurich. The thesis is simple: language gets its meaning from context. Prosody, gaze, gesture, surroundings—these signals don’t show up in plain text, but they shape what we say and how we understand it.

DVPS treats modality broadly: not only speech and video, but handwriting dynamics (stylus traces), earth observation (satellite imagery), and cardiology (medical signals). The aim is to train models that can learn from—and reason about—real-world context, then apply that understanding to classic language tasks like transcription, translation, dubbing, and dialogue generation.

Training data is evolving, too. With web text plateauing and increasingly machine-generated, the team looks to responsibly licensed audio-video corpora that preserve the human cues text alone strips away.

What multimodality unlocks for professionals

For language pros, this isn’t “human vs. machine.” It’s human+AI vs. human. As tools span more modalities, the job surface expands: semi-automatic subtitling, AI-assisted dubbing, context-aware review, and content creation that blends text, timing, and tone. The boundaries between roles (translator, subtitler, VO director, localization PM) start to blur—and education will have to catch up, adding tool fluency and multimodal thinking to the curriculum.

Inside DVPS: a four-year plan (and moving target)

The roadmap mixes basic research and engineering: probing scaling laws for multimodal models, securing high-performance compute, and delivering 12 application use cases spanning language, environment, and medicine. The team expects to rewrite plans repeatedly—a feature, not a bug, of working at the edge where methods, benchmarks, and best practices shift fast. As Bratières notes, recent advances (and efficiency breakthroughs like those showcased by other ecosystems) are a reminder to combine ambition with agile, lightweight engineering.


Want more conversations like this? Find Localization Today on Spotify, Apple Podcasts, or at Localization Today online

MultiLingual Staff
MultiLingual creates go-to news and resources for language industry professionals.

RELATED ARTICLES

Weekly Digest

Subscribe to stay updated

 
MultiLingual Media LLC