sponsored profile

James Jameson

Innovating the Subtitle at International Scale

Supported by LocWorld

M

ultimedia is more important to business than ever before. With companies of all sizes increasingly interested in international expansion, that means localizing video for foreign-language speakers. And of all the approaches to translate video, none is more venerable or resilient than the humble subtitle.

But preparing and translating video subtitles takes time and manpower — or does it? Language tech company CaptionHub recently launched new functionality to its service suite enabling subtitle generation for live video, opening real-time events and presentations worldwide to bigger and more diverse audiences.

It’s an impressive achievement — so impressive that CaptionHub Chief Commercial Officer James Jameson took home the top Process Innovation Challenge (PIC) prize at LocWorld 51 in Dublin, Ireland, this June. With so many great inventions presented at the conference, what was it about CaptionHub’s technology that stood above the rest? Jameson told MultiLingual all about it in a recent interview.

Could you tell me a bit about yourself and how you got involved with CaptionHub?

I’ve been with CaptionHub, which Tom Bridges founded in 2015, since its early days. I joined Tom about 18 months into the business. Tom’s background is in broadcast, visual effects, and moving image design, while my background is in enterprise software and scaling enterprise software companies. Since joining Tom, I have been on this journey with him.

CaptionHub is a full-stack multimedia localization platform. We help the world’s largest companies localize their multimedia content at scale. This includes multilingual subtitles for video on demand, live subtitling, on-screen text identification, and voice synthesis, which is currently a very hot topic.

Our live subtitling platform launched in October last year. Since then, we’ve been subtitling the world’s largest consumer events. Unlike other solutions that require hardware encoders or complex technical infrastructure, ours is cost-effective and can be set up in 30 seconds with zero latency. It supports up to 130 languages simultaneously — a unique capability in the market.

What was the genesis of your PIC-winning idea, and when did you begin working on it?

We actually started building our first live subtitling architecture in late 2018. What we have now is our third architecture, which is an entirely new rebuild of CaptionHub Live. The genesis was straightforward: while talking to customers who use CaptionHub to subtitle their videos on demand, they began asking if we could provide live subtitling as well.

Live video is very different from recorded video; it operates in a different environment at a different scale, with a high level of immediacy and time sensitivity. This introduces a complex set of challenges.

A key problem with our first two architectures was that we became a critical part of the broadcast stream’s path. Consider a major live event, like a product launch streamed online. People on stage with cameras send the stream to a video encoder, which uses video player technology to send the video stream around the world in HLS format, ensuring it ends up on our browsers without latency or buffering. In the early days, our solution took the broadcast stream, transcribed and translated it using AI, and then returned it to the player. This made us a critical part of the broadcast process, placing us in a precarious position where we began doing the job of the video player technologies. These are managed by massive companies like Brightcove and Vimeo, which are paid by large clients to handle the video broadcast stream. Being in the middle of this process created a risk for our customers, us, and the video platforms.

This led to our third architecture, which we started building in early 2022. It has been a long journey, but we’ve been tenacious. We’ve listened to our customers, kept going, and kept investing. Since early 2018, we have invested continuously, and it wasn’t until October last year that we saw any revenue from this effort.

What was the initial feedback you got once you launched this project?

The first person to speak publicly about our technology was an economic buyer at a four-day event in Las Vegas, one of the biggest tech events on the calendar. I had previously pitched an earlier version of our platform that wasn’t suitable for his workflow. By mid-2022, we had a new beta version. I called him and said, “Look, you have your event coming up. What do you think about using this?” He was very excited because no other technology in the market offered what we did.

He committed to our product before its release, which was a huge leap of faith. We signed a strict nondisclosure agreement (NDA). After the event, he broke his own NDA by posting about us on LinkedIn, praising us for solving a long-standing problem and describing it as a world first. This feedback was incredible. Since then, we’ve done similar events for clients worldwide, receiving similar positive feedback.

The feedback we receive is always encouraging. It’s not until viewers rely on subtitles that they realize their importance. When they do, subtitles become essential for understanding the video message.

Of course, the accolades didn’t stop there: you went on to win the 2024 PIC Innovator of the Year – Europe Award. What was it like to be acknowledged by your peers like that?

It was an amazing way to mark the innovation, which has been the result of years of hard work from the technical team. I need to emphasize that I am just the mouthpiece; the technology team has been the driving force behind this project. They are the engineers who have thought about every step of the process, from production to beyond.

Winning the PIC was truly an honor. We were thrilled just to be shortlisted and wanted to do our best. We provided a frank and concise account, as the PIC presentation is only three minutes long. We had to succinctly present the problem, solution, and impact. It was great to be among amazing peers with incredible technology, and we were ecstatic to win. Finishing the conference on such a high note was fantastic; it really couldn’t have gone any better for us.

What was remarkable about this particular PIC is that we didn’t use any new form of large language model (LLM) or generative AI (GenAI). Under the hood, the actual AI technologies we used were Speechmatics and Amazon Translate. These technologies have been around for a while and are improving, but the innovation wasn’t in the AI itself.

In a saturated AI ecosystem, it’s important to recognize that there’s still significant work to be done in applying AI to the right problem sets. It’s not always about the core foundational AI, but rather about the distribution or application of AI to address specific problems. It’s a good reminder that there is always the right tool for the job.

In this case, it was a particularly innovative use of existing tools. If necessary, we could swap out Amazon Translate for a GenAI LLM version of translation technology in the future. However, we don’t see the need for that at the moment. It’s about choosing the appropriate tools for the job and ultimately delivering quality assurance to our customers.

How would you like to see this technology develop? What would you hope to see, say, five years from now?

Our plan is to continue to be the software leader in this space. I hope we’ll still be providing solutions to our customers and doing what we do now. We’re orchestrating technologies — whether foundational AI, localization technologies, or video technologies — and bringing them into a secure, consolidated platform that can handle huge volumes of video.

We know that video is not going away. Multimedia is expanding and penetrating every aspect of our lives. Ninety-four percent of internet traffic is now video, up from 85 percent only a couple of years ago. Multimedia and AI are here to stay. We sit at the intersection of these trends, staying attuned to short-term market changes and hoping to remain in a defensible position for the long term.

Is there anything you’d like to add?

I’d love to acknowledge two particular team members. One is Tijmen Brommet, the architect of the platform. Not only did he engineer it in an incredibly intelligent way, but he was also so committed that he worked 24-hour shifts during production for our first customer.

The second person is Tom Bridges, our CEO. He is an entrepreneur with the patience to keep investing in a product even when there’s no revenue coming in, which is very hard to do as a business owner. We’re not talking about small sums of money; this was a seven-figure investment.

And finally, our job in the industry is to help brands connect the dots regarding multimedia localization. As long as we can get that message across to our customers, then we’re doing something right.

Related Articles