For years, live translation on Zoom has been something of a workaround. Services like Wordly or Akkadu had to rely on clever but clunky methods—capturing audio locally, simulating users, or analyzing post-meeting recordings. But that era might be ending.
With the recent launch of its Realtime Media Streams (RTMS) API, Zoom is now letting developers tap directly into live meetings. Audio, video, transcripts, screen shares, and chat—all streamed securely in real time, per participant.
In other words, Zoom just cracked open the black box.
A New Pipeline for Live Language Tech
Instead of building side-channel integrations, developers can now work with native Zoom data streams. That means lower latency, better accuracy, and much more control—particularly for AI-powered translation and captioning tools.
RTMS lets systems process multilingual meetings as they happen. No bots required. Just direct, structured data via WebSocket.
Zoom also appears to be playing it safe. According to the company, the API complies with SOC 2, HIPAA, and GDPR. Content isn’t used to train Zoom’s AI models, and user consent is required.
Translation Is Just the Beginning
The implications extend beyond language. RTMS is already being tested for use cases like:
-
Generating real-time clinical notes in telehealth
-
Providing live coaching during sales presentations
-
Detecting deepfake voices for compliance monitoring
Still, for the language industry, the clearest opportunity is in live multilingual access—something that’s been patchwork until now.
A Long-Awaited Opening
With over 500 million users, Zoom’s decision to unlock meeting data marks a shift in how real-time translation can be embedded—natively, not as an afterthought.
It’s not just a technical update. It’s an infrastructure signal.
One that says: real-time, AI-powered language access is no longer an add-on. It’s part of the meeting. With Zoom AI translation now powered by native data streams, real-time multilingual access finally moves from workaround to built-in infrastructure.

