Design and evolution of the localization pipeline in Snowdrop

Focus

Virginia Boyero

Virginia Boyero is an associate producer at Massive Entertainment | A Ubisoft Studio. She started in the games industry in 2006 through linguistic testing, before joining Massive in 2010. Virginia holds a degree in history, a passion that she shares with interactive media and language.

had just started as a localization project manager when I moved to Tom Clancy’s The Division, the first title released by the Snowdrop game engine in 2016. The first task of the localization taskforce was to develop the full pipeline, which would determine how all UI text was created and dealt with — a very rare opportunity, as game engines are not often developed from scratch.

There was a text database, but it was not connected to the game. All in-game text was hardcoded, embedded and scattered everywhere in the code. We had a few months to set it all up for the first playable version that required localization, an internal milestone that marked the end of preproduction and the start of the real production phase of the game.

The Snowdrop philosophy is centered around flexibility, fast iterations and empowerment of the developer to create content as free of obstacles as possible. In this spirit, the main request from the developers in regard to localization was to remove the intermediate steps for creating text. Traditionally, someone from the localization department is in charge of curating the database and creating all text on demand, as the features are designed and content is created. This had also been our experience from previous games developed at Massive, World in Conflict, AC Revelations and FarCry3.

Once the text is in the database and is given a string name and a unique ID, the localization department passes on the unique ID back to the developer to insert it in code, and finally end up displaying the text needed. If something changes in the design and they needed to adjust the text, it would have to be requested all over again.

This workflow has the benefit of leaving a well-curated and clean database, but it is far from ideal and comes at the expense of long iterations and developer frustration. Naturally, very often developers would turn to the much faster method of hardcoding the text, where there was no iteration time and they had full control, in case the feature required the text to change. However, this makes the text difficult to find, and most importantly it prevents the text from being translated. This type of error can fail at third party submission, where the game is approved for release by Sony or Microsoft, and it is considered a must-fix bug.

Because of these hurdles and the subsequent late and intense bug fixing that localization is associated with, developers often see localization as an inconvenience that does not benefit them in any way, since the game is developed in English. In the localization taskforce creating the pipeline, we were determined to shift that sentiment and integrate localization earlier in the production cycle. We would remove the temptation of creating hardcoded text by making localized text as accessible and easy to use.

By hearing developers’ request to remove the middle person curating the database and giving the editors the ability to create text on their end, we also demanded accountability for the content, which in return gave us a much higher level of commitment. Suddenly, the text was not something that someone else needed to take care of, but something that they were involved in from the beginning, and that they had direct control of.

This was also a new approach for the localization department, however. Changing the workflow completely, we were giving up the control of what was going in and out of the database. We would no longer have a central owner of the text, and it would be coming in from all departments, all at once.

Category system

To reduce the number of mandatory fields required to submit their data, and encourage developers to focus on the text and on the contextual information of lines, we also eliminated the need for a string name, and at the same time eliminated another classic bug. In other game editors, lines could also be referenced by their string name rather than just by their unique ID. This might seem clearer from the code side, but if the string is renamed, the line is broken and shows the code rather than the text in the game. With our approach, this would not be a bug anymore.

The text was being created in the game and imported to the database following the same folder structure, more or less organized according to where the assets in the game were stored. We were losing the contextual information that the string names provided, and although from the localization management end it was more of an annoyance than a critical issue, it was getting more and more confusing for translators as the game grew in scope. It became difficult to keep the database tidy. To address this, during the development of The Division 2 the localization project manager introduced the concept of a category system, a sort of compromise that would only require developers to select a field from a drop-down list (Figure 1). This would give enough contextual information to know where the text belonged, and because it was appended to the string name automatically, it did not require manual fixing.

Figure 1: The text widget focuses on the essentials. Changing the default |empty| to the intended text and selecting a category from a drop-down list is enough to create a line that can be translated in all languages.

Unused and resuscitated

We approached designing the tools in a way that would make creating bugs difficult. Another common problem that we tackled from the tools side was to avoid reusing strings in different screens with different visuals. This practice was also a consequence of the text pipeline being convoluted and out of reach from the content creators. Since getting the ID for a line was a long and complicated process, they would hold on to that ID for as long as possible and insert it in multiple screens where the same text appeared. However, the text box was not necessarily the same size and style in all those screens, and it often had character length constraints (normally, the maximum character length is given by how many of the letter “e” that can fit in the field, as that gives the best average of typography space). That would force translators to shorten the text to adapt it to the smallest text box, but it would also replicate unnecessarily to the fields with more space that could display the full text.

The pipeline was designed so that all text had to be unique. A logical result of this is that we would have multiple lines with the same text. This could be seen as inefficient and costly; however, with the help of computer assisted translation tools and an embedded pretranslation feature in our own database, the cost was by far much cheaper in time and money than dealing with the bug fixing, and lower quality associated with reusing strings.

The second adaptation was keeping the database current. With text being so easy to create and with so many content creators, the chances that text was going through iterations and was used for test purposes was high. If dealing with increased volumes of text coming from duplicate lines was not an issue, then translating everything that was ever created through the development cycle of the game was a very different one, and an unreasonable one. Again, we built into the design of the tools the functionality that would keep the database clean. Through a dependency system, the database would be able to tell if a line was still being used, or if the asset that it was linked to had been deprecated. In the same way that lines were being imported to the database, the lines that were not in use would be moved to a specific location. If the lines became used again, they would be resuscitated and moved back to their original location, showing up automatically in the next batch that went out for translation.

Concept art from Tom Clancy’s The Division 2

Concept art from Tom Clancy’s The Division 2

Approval and locks

We had given up direct control of text creation, but with the nature of the game development cycles, we still had to keep control of the timing. For most of the production we can keep the gates to the database open, and the lines created by the development team will get imported automatically in a cadence that we control. Once every day is usually enough, but it could also be increased to once every few hours, or in a custom schedule at determined times. However, once we are nearing the start of translations it is important to know what and when new text goes in. For that purpose, the pipeline can be switched from automatic imports to the approval-required mode. When approval-required is enabled, the new and modified lines are gathered in a state where they are usable and can be seen in-game to test them, but they are not yet part of the text that goes out for translation. This enables us to track down where the text came from and put it on hold for as long as needed, which comes in handy during the closing phase of the game, where only a handful of changes are allowed, or when we’re developing more than one product at the same time and we need to control the timing of the releases. For instance, during the post-launch phase of the game, more than one product can be in development in parallel, but we need to make sure that only the content for the next product is being released.

Debug tools

Another pillar of the pipeline for empowering users was having strong debug tools that are as useful for testers to spot bugs as they are for developers to fix and prevent them.

From early on, we worked closely with the UI team to make them aware of the constraints for different languages, from variable length — where German and Spanish translations can sometimes take 40% more screen space than English — to the special height constraints needed for Asian fonts.

This early collaboration resulted in a very flexible UI that was designed with all languages in mind, not just English. Resizable text boxes will expand as needed to fit the more lengthy translations without compromising the look and restrictions of the design. Scroll bars and scrolling text will be enabled when the text box needs to remain static. On their suggestion, to help testing their design early on, we developed a fake language that we call Debug Worst Case. What this mode does is take the longest translations from all languages and displays them in the same screen. The result is highly unreadable except by the most extreme polyglots, but it’s extremely useful to spot problematic areas and adapt the design on time. We do a selected translation of menus to load the Worst Case language with data, since waiting until all translations are in at the end of the project would be too late to do meaningful changes to the design.

Conclusions and next steps

With the ever-growing scope of productions, budget and time are becoming an even more pressing, bigger challenge to solve. A waterfall approach where localization waits until all content is ready to start translations and recordings is no longer viable, if it ever was. Early involvement in production also means having the opportunity to make an impact on features that are directly related to the department, such as subtitles and closed captions. Because of this, the department participated in an accessibility talk in London organized by the IGDA-GASIG last June. Localization is just the first layer of accessibility, and it only makes sense to be involved in the development of the immediate features.

Because of the long post-launch life of games, we need to be more flexible than ever in order to support the short turn around cycles of live updates, and respond to the community in the fast-paced manner they expect.

Games localization is coming of age, and is shifting away from dreaded last-minute pushes, often done remotely without direct contact with the development team. It’s now becoming a collaborative process in the complex engineering effort of shipping a game. It’s clarifying needs, increasing awareness and making content creators responsible from the start.

The future is exciting with new technologies to explore. Investing in machine translation and in advanced text-to- speech technology with multilingual support are promising areas of development, and will surely soon become essential to supporting the scope of future games. With the continuous improvement of the Snowdrop pipeline, we aim to further reduce hurdles for developers, and become even more agile and sustainable for translators.

A few months after The Division passed the internal milestone marking the start of production, I transitioned to an associate producer role following a team restructuring. I’ve kept that position for the rest of the development of The Division and The Division 2, still working directly with the localization department, as well as audio and intermittently with narrative.

Back to Issue