Setting up a successful video globalization process at Dell

By Ralph Jung August 11, 2015

At Dell, we localize more than 250 eCommerce videos into 14 target languages per year, which is rather unique within our industry. Video localization enables us to reach up to 90% of all internet users in the world, according to the T-index provided by Translated.net. This requires a process that is scalable, cost-efficient and fast, yet still offers a high level of quality.

A visitor who watches a video will stay an average of two minutes longer on your website and will be 64% more likely to make a purchase, according to a study published by ComScore in 2010. We also know from our own customer feedback that customers are more comfortable engaging with content that is available in their own language.

To successfully localize your videos, there are three main steps to follow: define your video strategy; decide on a localization type; and set up your process.

Define your video strategy

In the software localization industry, it has long been considered a best practice to treat the source language (English in most cases) as just another language. We can follow the same principle here: don’t design your videos for a specific language or culture, but take into account that they need to be localized into multiple languages and cultures.

For product videos, we have found that videos of around two minutes are ideal for keeping our customers engaged. For videos with different content, there may be a different ideal length. As most of you are probably aware, translations will naturally expand in many languages. If a video tries to convey too much information within a short time frame, localization will run into space issues: closed captions might get cut off, translated voiceover might require additional recording attempts and on-screen text will need to be reduced in font size.

Speakers and voice talents should make sure to speak at a slow pace and to pronounce all words as clearly as possible. This will improve the accuracy of transcriptions, and ensure that translations have enough space to expand into.

Make sure that you use language that is not too culture-dependent. Avoid analogies and metaphors, don’t overdo humor and try not to use words with many overlapping connotations, as these make it more likely that your translators will come back for clarifications that can delay your process. Some real-life examples I’ve come across are “they’re a busy-bee family,” “get rid of those handcuffs” and “ear-catching sound.”

If you want to illustrate something by using everyday examples, make sure that they are not local. Instead of saying “It’s like winning the Super Bowl,” you could instead refer to an international sports event, such as “It’s like winning an Olympic medal.” That way, you can be sure that there’s an appropriate translation in all target languages.

Also be careful not to use gestures that might have different meanings in other cultures, such as thumbs-up or thumbs-down. Always portray people in a culturally appropriate way — avoid summer clothing styles if you want to use the same video in markets that are sensitive to certain clothing styles.

Be careful with information that could easily change or be different depending on the audience. At Dell, we typically don’t mention specific prices or currencies, product colors and features that are not available globally, or local phone numbers and URLs.

Avoid overloading the viewer with multiple communication methods at the same time. Don’t show on-screen text and closed captions at the same time. Also, give viewers enough time to read everything on the screen before continuing with more spoken information.

Before continuing, ask yourself the following questions:

Where will the videos be published or hosted?

Which languages do the target audiences speak?

What is the budget for video localization?

How many languages can be covered with this amount?

Are there any brand requirements to attend to?

Who will prepare the files for translation and implement translated files?

These considerations will play a big role in determining your preferred localization type.

Decide on a localization type

People often use the terms closed captions and subtitles interchangeably. In some cases, however, they are used with a distinction. In the film and TV industry, subtitles usually refer to a transcript of the dialogue, while closed captions contain extra instructions for the deaf or hard-of-hearing, such as “[loud bang].” In the latter example, the closed captions are meant to replace all sound, not just dialogue. There is also a distinction between open captions and closed captions. In this context, open captions are effectively burned into the video footage, while closed captions can be switched on and off. To make it even more confusing, open captions are sometimes called subtitles.

Closed captions offer by far the best trade-off between translation costs, reach and deployment speed. They enable access to your videos for people with hearing difficulties or deafness, which helps you fulfil your legal accessibility requirements. And if your videos are frequently viewed by users at work, during their commute, in retail environments or waiting areas, adding closed captions will enable all of these users to follow your storyline without the need for listening to the audio.

Another factor that should not be underestimated is the increasing use of keywords and phrases in closed captions as ranking factors in search engine results pages. Big players such as Google and Bing keep adding more and more indexing capabilities for multimedia content.

As long as you don’t burn (hardcode) your closed captions into the video footage, your transcripts and translations will remain editable at any point in time. This means that you can easily correct mistakes that become apparent even after the publication of your video. For this to work, you will need a hosting solution or a video player that has the ability to display closed captions as a layer on top of the video.

On most video platforms and in most video players, closed captions are displayed at the bottom, so always make sure that you don’t display any other on-screen text or important visuals in the lower part of the video.

Depending on the number of manual preparation and deployment steps, closed captions typically have a turnaround time of four to six days.

Since closed captions are generally stored as text files separate from the video file, you’ll usually only need to maintain a single video file that serves all languages.

If you want to localize your videos with more authenticity and closeness to your target audience, and you have a significant localization budget at your disposal, your best choice is to rerecord the voiceover with native speakers of your target languages.

There are three different ways to localize voiceovers. The most frequent one is called dubbing, which means that the translations must accurately follow the lip movements of the persons in the video. You will need to give your translators a lot more flexibility with regard to translation accuracy, as they will likely not be able to fit as much information into the same length of spoken dialog as was in the source language. The recordings often require multiple retakes until the voice recording fits the dialogue, which increases recording costs. You may also require multiple voice talents if there is more than one voiceover in your video.

If your videos use a background narrator instead of a visible spokesperson, you just need to ensure that the translations are of a similar length as the original, which makes it easier and cheaper than dubbing. Since video editing is less expensive than voice recordings in a studio, you may choose to have the video edited to fit the audio recordings, either by using B roll footage, reusing scene cuts within the same video, speeding up the translated audio track by an unnoticeable amount or by stretching the video track very slightly.

There is also a hybrid method called overdubbing, which is very common on European television. For this, the original voice is lowered in volume, while the translated voice is layered over the original audio track at normal volume. This is usually delayed by one or two seconds, to give the impression of a live language interpreter. This works best for interviews and documentary-style videos.

The turnaround time for voiceover localization is typically at least double the amount of time needed for closed captions, since you cannot start the recordings without having your translations ready and approved. For the final edit, you will also need to provide separate files for the music and sound effects that were used in the video.

Since each language requires its own audio track, voiceover localization will result in multiple copies of the same video that need to be deployed and maintained.

Another common localization option is on-screen or embedded text. This can cover anything from presentation-style videos with graphs and animations, to product shots with feature call-outs.

Inserting translations into videos with on-screen text is typically done by a person using a video editing environment such as After Effects or Final Cut Pro. While this can be done in-house or by a vendor, the editing increases both the costs and the time needed. It is therefore crucial that you have your translations reviewed upfront, since they cannot be edited later without generating additional editing costs.

Turnaround time for on-screen text localization is typically between five and eight working days.

Like voiceover localization, this process requires the creation of multiple copies of the same video, each with its own translated text.

Set up your localization process

If the number of videos to be localized is overseeable, you will probably get away with managing everything on an ad hoc basis. If you expect high numbers, or a recurring monthly or quarterly workload, it pays to set up a well-defined process with specific roles, tools and a communication workflow.

Typical workflow steps include:

Transcription or creation of timed transcript from recording script

Handover to translation

Dealing with translation questions (such as ambiguities)

Delivery of translated files

Deployment of video to hosting platform or website

Adding the closed-caption file(s)

Reviewing the translations

Publishing the translated videos

If you chose voiceover, you will need to add steps for reviewing voice samples and language-specific speech guides. Similarly, if you chose localized onscreen text, you’ll need to add editing steps toward the end.

For closed captions and voiceover, you should talk to your translation provider to see if they can work with timed transcripts. It may be necessary for them to have experience in using video-specific editing or localization tools such as the free Aegisub, or the SRT plugin for Trados Workbench. Initially, our own localization process relied on a number of manual conversion steps to turn the closed-caption file into a translatable document. To achieve a higher level of automation using our translation management system (TMS), we worked with the TMS provider to set up a template that automatically converts the time stamps from the XML-based video captions into ordinary tags, which appear like formatting tags to the translator, and which can be moved around freely. Once the translations are done, these tags are automatically converted back into time stamps, so we can import the resulting files into our closed-captioning platform. This change allowed us to save roughly one-third of our annual translation budget for eCommerce videos.

Where possible, it is always preferable to use the original recording scripts as the source document for your translation requests, instead of a transcript. While it is possible to transcribe videos after their production, transcription adds time and unnecessary costs, and carries a higher risk of introducing ambiguities. For example: did the speaker refer to terabytes or petabytes? In the best case, ambiguities like this will be caught by the transcriber or by translators later in the process, and you just lose some time. In the worst case, you will need to send your video back for re-editing or even introduce a legal liability.

Did you know you can use YouTube’s speech recognition engine to kick-start your transcription? YouTube automatically adds a computer-generated transcription to each uploaded video. While these transcripts usually require some heavy post-editing, it is often faster than transcribing from scratch. Alternatively, if you have the original script that was used for recording your video, you can simply paste the entire text into YouTube, and their algorithm will automatically take care of adding the time stamps for you.

If you produce or localize time-critical videos, you might want to consider what I call the “publish-first” localization model. Many of the videos that come through our process are product videos, which need to be ready for the launch of new products. Product videos explain new features, showcase the design and provide examples of how a product can be used in real-life situations.

Since most of our videos have closed captions, and translation reviews can significantly delay the readiness of our videos, we decided to make videos available for reviewing after they are published to the website. For this, we make use of a third-party closed-captioning platform that provides our reviewers with direct access to a web-based editor where they can “unpublish” closed captions, apply any edits deemed necessary and sync their changes with the online video in real-time.

As long as you work with translators or translation vendors that you trust, and have appropriate quality assurance measures in place, the risks of the publish-first model are very manageable. It helps take some of the pressure off the video deployment team and gives everyone some extra breathing space.

In the end, which solutions work best for you will depend on the scope of your video program in general. You can start small and slowly increase the complexity and throughput of your process as needed. If your website uses analytics to measure your success, don’t be shy to use them here as well. One of the more reliable ways is to check how many of your video viewers convert into customers, sign up for your newsletter or register a user name. Then compare this to website visitors who don’t watch a video. At Dell, the inclusion of videos on all relevant pages has proven to be a very successful strategy.