Artificial intelligence has quickly shown itself as a disruptor of many industries. In localization specifically, AI has the potential to assist many localization processes by harnessing technology and empowering new workflows. It seems that every few months we are asking how AI will change the world, how customer needs are driving its evolution, and how we are tackling the challenges that come along with such a rapidly evolving technology.
AI will inevitably play a greater role in supporting content localization in our industry, but perhaps not in the ways we anticipate.
Who owns my voice?
Localized video content, particularly through the art of voice-over dubbing, has the potential to strike a connection, trigger emotional responses, and bring characters to life through authentic performances – all having a compelling effect on storytelling. Speech-to-text generators have been used for several years in the subtitling marketplace and we’re increasingly seeing AI-enabled text-to-speech voice generators and adjacent tools, guaranteeing studio quality and natural-sounding voice-overs. Microsoft’s text-to-speech AI model, VALL-E, claims to simulate a person’s voice with only three seconds of an audio sample. The creators speculate that AI speech editing will be able to change a person’s voice recording by simply altering the text transcript i.e., making them say something they didn’t originally say. YouTubers are producing plenty of how-to videos on creating high-quality AI voice-overs using ChatGPT, showcasing how quickly AI is evolving traditional VO processes – all with the goal to produce human-sounding speech (i.e. not seemingly computer-generated).
There are many unanswered questions about AI voice-overs like: who owns those voices? What are the copyright protections and regulations when AI changes what a voice artist originally recorded? What if an artist’s voice is altered in an AI workflow and repurposed? How does one even track these instances with authentication getting increasingly difficult? What if AI work is created that resembles the voice of an existing artist or piece of content yet created in a different language? Even though an artist may consent to AI work, who owns the copyright of their prior work and performances? With the ability of artificial intelligence to alter voices (and images), who owns the rights to do so? What are the potential risks around the AI localization business models? These are murky waters indeed, and with the speed at which AI is making its claim, these are all important ethical and legal considerations to keep in front of mind.
The synergy of AI and humans
As studios and content producers continue to reach new audiences by localization (either by dubbing or subtitling), content owners are looking for tools that can expedite their timelines to get to platform quicker. With the current industry trends, they’re also looking to do it more cost effectively. As content owners work to capitalize on the opportunities to sell their titles and catalogs in new territories, more automated video dubbing service providers are entering the market offering lower price tags and promising to reduce bottlenecks. In some cases, claiming to offer a tool that will revolutionize the industry and eliminate the need for human intervention. And yes, it’s true that AI and machine learning technologies can save money and may get content more quickly to streaming video-on demand services, but at what cost and in what quality? The quality of machine translation outputs has improved in recent years, and there have been enhancements in speech recognition, speech synthesis, and natural language processing technology within the localization industry but there is still a need for human involvement to compensate for AI deficiencies.
The dubbing process relies heavily on context, emotions, colloquial language, and subtle situational and cultural nuances that are needed for high-quality localization. These complexities and critical levels of quality cannot be derived from AI and machine learning – not yet anyway. Most believe that AI will continue to develop, but there is also general agreement that having humans involved in a process where quality is so important will continue for quite some time. Simply put, maintaining the required level of quality and consistency in high-end localization is not possible without extensive human intervention. Thus, I see AI in its current state as an assistive tool, albeit one that is very quickly evolving. Without a doubt, however, we know AI workflows are going to be transformative, while still leaving enough room for human creativity.
AI use cases
It’s not to say that AI tools aren’t a good fit for localizing certain types of content. There are a wide range of cases, such as promotional videos, explainer/training videos, e-learning content, podcasts, some social media, and content that may require a lesser degree of artistry and performance. For example, here at Visual Data, we are experiencing an increase in requests for simulated voices in our localization services. Especially for types of content that don’t have a big quality requirement or multi-lingual content for international release with low budgets.
When it comes to the delivery of AI-adapted content, it’s about meeting audience expectations and fulfilling the perception of the individual’s trademark or a company’s brand. Delivering perceived lower-quality content may hurt a company’s brand in the long run, increase churn rates, or simply cause the audience to turn off.
I also see a strong case for AI tools to help us analyze data, track consumer behavior, and compute how localization may impact viewer retention and monetization.
AI technology is a tool; it cannot solve all our challenges, and it cannot replace the human element which remains at the core of our work. It is certainly fascinating, however, to experience all these technological advances and developments, and be a part of this rapid and pivotal revolution.