Live Machine Translation:

A new dawn for conference interpreters?

By Oddmund Braaten

Advancements in communication technology are helping bring the world closer together without ever having to leave the house.

As a remote-first business, many of our colleagues virtually travel five continents before lunch most days.

Everyone from large corporations and NGOs to SMEs and freelancers can gain access to live translation services, making it extremely easy to cater to audiences across the globe. It also reduces the need for interpreters to travel, thereby minimizing costs and the impact on the environment.

But as events went digital during the pandemic, the need for language interpreting has skyrocketed. Demand is only expected to grow as the world continues to connect and conduct business in hybrid participation setups, and businesses are having to find new solutions to cater to international, dispersed audiences. same.

Not enough interpreters to meet increased demand

The benefits of technologies such as remote simultaneous interpretation can be seen not only in the proliferation of online events, but in any kind of event setups. From traditional conferences, to town halls, training sessions, and press conferences: the switch from traditional setups with on-site support from interpreters and conferencing hardware has shifted towards cloud-based interpreting done remotely.

The change in how events are being delivered has opened up previously unexplored opportunities to invite and connect with attendees. Many enjoy being able to connect from the other end of the world, but for those that prefer the buzz and excitement of being on the ground, hybrid events are becoming increasingly popular to cater to both sides.

However, the globalization of events could soon lead to a shortfall of skilled interpreters catering to rising demand. The continued development of AI and machine translation tools is therefore critical, and there have been recent announcements from tech giants such as Meta and Google on their own translation software.

Common machine translations myths are being debunked, helping spread adoption and realize new communication methods.

Can we automate live translation?

To help combat language barriers, advancements in technology are helping accurately capture, transcribe, and translate speech from one language into another. More specifically, a combination of two different types of artificial intelligence technology is making language solutions more accessible, right at our fingertips, and at short notice.

How accurate, you ask? Well, according to one study undertaken of the top five machine translation tools, they’re a lot more accurate than stereotypes would have people believe — in some cases requiring no changes or edits whatsoever from professional linguists.

Together, automated speech recognition (ASR) and machine translation (MT) technology is able to transcribe and translate live speech. Attendees are provided with real-time closed captions which they can turn on and off depending on their preferred language.

The difference between ASR and MT captions is that with ASR, the AI-powered technology automatically recognizes the speech and transcribes it into text in real-time. With machine translation, AI-powered technology will automatically translate the speech from one language into another, and display it as text in real-time.

The key differences between the two come down to MT translation and human interpretation. Conference interpreters will always strive to convey the message of the speaker, and may paraphrase, while machine translation aims for completeness of translation of the sentences spoken.

So, whereas MT captions provide a complete translation of the sentences spoken, ASR captions from interpreting audio are being used in conferences involving simultaneous interpretation and are in sync with the audio interpretation.

Because the captions are based on a professional live audio translation from a vetted and subject-savvy conference interpreter, the speech is translated by taking cultural aspects, context, and tone of voice into consideration.

However, we know that captions and even translations aren’t always accurate. Perhaps there’s a bit of background noise which makes it hard to hear clearly, or a speaker has used a new technical term or acronym which the translator doesn’t fully understand.

To significantly reduce the chance of this happening, glossary functions can be set up where users can input certain terms or technical details. These can then enhance the accuracy of captions by preloading the system with things like very specific terminology or branded names. Professional linguists can help, review, and vet the glossary for certain events, and through the use of machine learning then only gets more accurate the more its used.

Why the use of captions?

As a visual aid to follow the speech, captions are provided to viewers and attendees, with people able to follow discussions with live transcription provided almost instantly after the speaker has delivered their words on stage.

Well, these are especially useful for delegates and attendees who are for some reason unable to hear what is being said or for those who choose to read rather than listen or those who need visual reinforcement.

We see this a lot now in our personal lives, with many people preferring to enable subtitles when streaming video content on YouTube or Netflix even if they fully understand the language being spoken. No one ever wants to miss a word out of fear of confusion, which is why it’s important to cater towards all audiences no matter what their preference is.

Let’s consider other examples where captions might be useful to provide:

  • The deaf and hard-of-hearing, who can follow the dialogue in written form with the aid of captioning.
  • People who wish to follow the discussion but are in a location where another dialogue is taking place.
  • Individuals in a noisy environment like in a café who wish to follow the event even when listening conditions are poor.
  • Those who wish to have a readable feed to back up their understanding of what is being said. For instance, in a chemical conference when complex formulas are being voiced it is sometimes useful to have a readable text feed alongside the spoken words
  • Those attending (but not contributing) in areas of poor network connection where audio feeds may or may not be unreliable.

It means conferences can now support any kind of participant. Not just people who don’t understand the host’s language, but also the deaf and hard-of hearing, those who simply prefer to have subtitles or captions, and anyone joining from loud environments such as coffee shops or while traveling.

Machine translations are a significant step towards demolishing language barriers and making events inclusive for all. But where do the interpreters come into this?

Taking the robot out of the human

It’s important to remember that, while the technology is perfectly capable of working on its own, there will always be times when interpreters are needed. Just as computer-aided-translation systems led to more work for translators, demand for interpreters will only grow as automation advances.

It’s great at large global events, for example, where one-too-many translations makes more sense, or during more technical medical or legal conferences where added context and expertise are needed.

Plus, as we know, technology isn’t always perfect. Sometimes it’s how someone said something rather than specifically what they said, and it’s hard for a machine to pick these changes up and translate them. Plus, language changes fast, and there can be new words, phrases, or abbreviations used that haven’t been figured out yet.

There are also technical aspects which mean a human touch will always be required. Machine translations rely on using large volumes of language data to quickly interpret what’s being said. So if the data isn’t there in the first place, or if there isn’t enough data for less commonly-spoken languages, things can quickly become muddled.

Which is why, despite the rapid and significant improvements we’ve seen in machine translations, there continues to be a place for human beings at the table. What machine translations do is help take out the robotic, repetitive elements for interpreters — and for good reason.

Conference interpreting is the third most stressful job in the world, according to the World Health Organization. It’s right behind being a fighter pilot and an air traffic controller. Being able to listen, understand, translate, and then talk while constantly switching between languages requires concentration and uniquely intense focus.

The technology is simply there for when interpreters can’t be used. As this BBC article puts it, “The world’s most powerful computers can’t perform accurate real-time interpreting of one language to another. Yet human interpreters do it with ease.”

Interpreters and translators can help prepare machines for higher accuracy, for instance through glossary creation of context-specific terms, names, abbreviations, and more.

Machine translations taking center stage quickly

Machine translations are moving full steam ahead, with cases now of automated transcription technology hooking directly onto video conferencing software such as Zoom or Teams.

Organizers can provide languages to whoever wants to attend, from any country and in rapid speed. Attendees get to enjoy speech in their native language while fully engaging in an event that caters for them.

With always-on translated speech, language barriers could soon become a thing of the past. If we can continue improving on the technology and make the lives of interpreters a little bit easier, then it’s a win-win situation for everyone involved.

Oddmund Braaten is CEO at Interprefy.



Subscribe to stay updated between magazine issues.

MultiLingual Media LLC