Community Lives: Mojito: birth of a new community

Communities don’t just spring to life with thousands of members and well-defined goals and the resources to implement them. Communities grow from modest beginnings, morphing and taking on their own identity as they evolve. Fortunately, in our industry, we have a newly-formed community that makes an ideal case study to help us understand just how the processes of development can start and take shape. I’m referring to Mojito, a continuous localization application developed by Box, the online file sharing and content management platform. In this case, the germ of the envisioned community is a small, dedicated team of highly talented developers in Box, Team Moji.

Mojito started as a Box internal hackathon project in August 2014. It was first built on top of an old system to automate recurring manual tasks such as downloading and uploading strings and exchanging them with vendors.

Since then, the team has expanded the product definition of Mojito and decided to build a standalone application to support the entire software localization process at Box. By December 2015, Mojito supported Box’s webapp application. By February 2016, Mojito was already supporting all Box products. In February 2016 Mojito went through a refinement process with the aim of being open sourced. The open source Mojito was launched in August 2016. Open source has been a part of the Box technology stack since the company’s earliest days in 2005 and in 2014 Box launched opensource.box.com to host all of their projects.

First things first: the name. I was surprised to learn that Mojito owes nothing to its namesake, the Cuban cocktail. The name is derived from the team’s name, which is Moji. Box’s globalization effort was launched in Japan and moji is Japanese for character. We all know this from the now ubiquitous and Unicode-approved emoji we use in our messages and emails. The Japanese term mojibake, which is used to indicate corrupted characters, was what led to the name Team Moji and consequently Mojito.

Team Moji was tasked with responding to the question, “How do you localize continuously without compromising the integrity of your apps or breaking your development process?” The answer seemed to lie in the choice between buying an existing app or building one. The problem with buying software is that you are vulnerable to dependencies. Licenses, as anyone in the language industry knows, can be burdensome. Prices and constant updates can be a headache. Even worse, users do not always have full control over their own work. Consequently, when the team examined the feasibility of building their own app, the positives in this solution soon became apparent. Box is global in its operations and they were already active in the localization community. The beauty of sharing in a community is that you learn. You also teach and that open cooperation had an irresistible appeal. As they grasped just how much other companies could benefit, it logically led to the decision to open source the tool.

Open source is, of course, now itself a global phenomenon with a long track record of enabling all manner of computer-based activities. It may not be the universal panacea that its evangelists proclaim, but the benefits of inviting gifted collaborators to share in the development process encourages innovation that results in safety, security and stability. Furthermore, given the multicultural and multilingual nature of the user community, the build option was clearly the best way to go.

Anyone who has ever had the slightest involvement in specifying software applications knows just what a headache that can be. Nailing requirements alone can be like tackling an assault course blindfold. This has been known to software professionals for decades and many methodologies and approaches to software development have been adopted to try and overcome the barriers. One of these is rapid prototyping and this in turn has led to the advent of hackathons. These events, now popular fixtures in software communities, are held regularly to facilitate intensive collaboration amongst a number of specialists with the aim of creating new products and services. With little more elaborate a requirement than creating a continuous localization platform, senior software engineer Jean Aurambault says, “The first version of Mojito was just a hackathon project.” But this was just a beginning and the team took the impetus and with much additional effort has designed and implemented a solid platform. 

Team Moji

The diversity of the Mojito build team of senior software engineers has, it seems, contributed to ensuring that Mojito meets the needs of a global community of users. But it all started with a small, dedicated team of Box employees coming from diverse backgrounds.

Will Yau, senior software engineer, has been at Box for over four years now and open-sourcing Mojito has been an important milestone for him. He accidentally found himself to be one of the founding members of this globalization team at Box when asked to help Box tackle international growth. Yau seized the opportunity and realized how much working in globalization resonates within him. “Looking at how my parents would use modern-day software, if Chinese is not available to them, there is no way they can use it.” Such personal motivation seems to pervade Team Moji. Yau adds, “At Box, we know very well how much effort and dedication it takes to write exceptional software, however, localizing software should be easy. That’s why we wanted to share Mojito with everyone. Mojito is a free tool that is fully open sourced, and it takes only a few minutes to set up.”

Jean Aurambault is a full stack engineer and has been working on building and integrating localization tools as well as driving internationalization efforts to deliver improved global products. He is a globetrotter, always looking for new destinations and cultures. Getting involved in the localization world was an unplanned journey that just ended up matching his personal interests! It started in France where he was working on a localization platform backend for Yahoo!. He later got the opportunity to join the team in the United States and learn more about the industry. When he joined Box three years ago, he focused more on globalization but soon the need to build a localization platform surfaced. “Every day, millions of people use Box around the globe, it was important for our team to build a tool that would allow us to give users a seamless experience. Mojito allows us to consolidate all of our software localization efforts and enables us to search, edit and bulk-manage strings across all of our products and languages within one environment. Having full control over software content is critical no matter how big or small your company is.”

Senior software engineer Adrien Loison has been working at Box for almost four years. Loison, who has been on Box’s globalization team from its creation, felt dissatisfied with the French version and sought to bring it up to the same quality as English. “Although I had no previous experience with globalization, I was excited to get other languages to the same quality as English. Building Mojito has been a big part of the journey and I am happy we finally open-sourced it.”

Hanna Kanabiajeuskaja is the localization product manager at Box. Her experience in the language industry is already varied and extensive. This has given her insights in localization workflow that have proved to be invaluable in making Mojito live software that delivers what it promises. Her journey brought her to the localization world very early in her life when at the age of eight, she received an electronic Tamagotchi dinosaur for Christmas. She recalls, “It was pretty easy to configure. All I needed to do is set the time, and I was ready to go. I was very keen on technology, and I wanted to play with my new toy all the time. But soon I started noticing that my dinosaur was different from my friends’ pets. In the daytime, it would snooze. At night, however, it would wake me up and demand that I feed it. It took me a few weeks of terrible night sleep to realize what was wrong. While setting the time, I mixed up AM and PM. Truthfully, I didn’t know what they were at the time.” Since that early localization confusion she has graduated to become a translator, interpreter, reviewer, terminologist and, finally, localization product manager. In all of these roles, she became intimately familiar with localization workflows and issues that most tools don’t address. “Serving the world’s largest enterprises means that Box must be localized into many different languages and Mojito represents a simple way to continuously localize products and reach a global user base,” said Kanabiajeuskaja.

Jee Yi, senior software engineer, joined Box in January 2016 and also brings the same driving commitment to meet the needs of software to dissolve language barriers allowing free communication. She expresses another uniting factor in the team with her enthusiastic support of making Mojito open source. Her family immigrated to the US when she was 14 years old. Growing up in Silicon Valley naturally led her into a software engineering career.  Her first globalization experience came when she was assigned a project to convert ANSI ODBC driver to Unicode at Hewlett Packard.  However, it was when she met Jean Aurambault when she joined Yahoo! in 2011 that she started working on Yahoo!’s localization platform called YALA. YALA was used by localization project managers to manage translations for all products at Yahoo!. After Aurambault left Yahoo! to join Box, Yi led the team to build the next version called Dragonfly. It was a self-serve localization platform built for all developers at Yahoo!. Yi also worked on in-context review for iOS using iOS simulator for which she has filed a patent. The idea of open sourcing Mojito attracted her to join the Box team. “Box is all about helping people get their work done efficiently, wherever they are,” said Yi. “Mojito is an extension of this mission and a result of our dedication to make software localization efforts as efficient as possible. Many engineering tasks are automated, so automating localization was a natural choice for us.”

The workflow

Workflow is, of course, a much-studied business process because it describes repeatable patterns that guide activities through many different stages toward a reliable end. With localization workflow, the added element of multilingual content requires vigilance and accuracy to avoid those aforementioned mojibake and other glitches.

Localization has increased the complexity of the project life cycle and this demands greater management resources. It’s impossible to envisage today’s business and cultural environments without the benefits of widespread automation. Translators are all too aware of this and the advent of localization engineering has spawned many tools to assist in the process. Where does Mojito fit in the big picture of all these competing ventures? The simple fact is that most continuous localization apps are offered by companies at a price. Mojito has been designed as a foundation that will cover continuous localization at any company and if it doesn’t cover their particular use cases, they can tailor the software to their specific needs. That’s the strength and beauty of the open source model. The buzzword is “agile.” Given the startling diversity of content that can exist in a single business entity, companies must be able to adapt instantly to evolving business requirements. Furthermore, as the pool of commercially-viable languages grows, agility must at times become acrobatic to meet needs. Mojito owes its existence to the fact that Box is an enterprise company, which means that they take the initiative to make things happen. As they spread their commercial wings and cover more and more of the globe, the content they generate proliferates at a rate of knots. How then to maintain standards of quality, reliability and security? The systems analysts who write up use cases (statements describing how users will use technology to achieve objectives) will specify how goals should be reached. But Team Moji took a characteristically different approach in building Mojito by adopting, to use their word, a more “holistic” approach to solving the problems of continuous localization. This enabled them to envisage their system as more than just the sum of its parts. Again, the decision to open source the project was vindicated as this helped them build a clean, lightweight and scalable product fit for purpose and able to be refitted for other users’ needs.

For those unfamiliar with the localization process, the goal is to run an application as if it were created for a specific culture by someone from that culture. A good translator is, of course, adept at rendering a target text with much more than semantics in mind. However, in this era of increasing automation, constant change in response to customer requirements and inadequate numbers of well-trained language professionals in our community, any improvement in the flow of work along the supply chain is more than welcome. Mojito addresses the situation in a straightforward and efficient manner.

The difference in the localization process before and after Mojito is telling. The life cycle of a string in a manual process begins with extracting it from the source code into a localization resource file. Resource files, of course, come in a variety of formats depending on the application they are used for: .strings, .resx, .csv, .xml and so on. Next, for each product, a developer has to export a resource file and send it to a localization project manager using a preferred file transfer protocol. Even if only one string was updated in the application, the entire resource file has to be sent. The localization project manager sends the resource file to a project manager on the language service provider (LSP) side. The LSP project manager then sends the file through preprocessing, which converts various files into a unified format, usually XML Localization Interchange File Format (XLIFFs). XLIFFs are files designed to allow data selected for translation to be passed seamlessly during localization. The entire resource file goes through existing translation memories (TMs). This causes problems because if any corrections were applied to any of the strings on the client side, they may be overwritten on the LSP side if TMs weren’t synced. The resource files are translated. Another potential problem is that in the manual process, if placeholders get broken, there is no way to check that automatically until the strings go back into the product code. Resource files get converted back from XLIFF into .strings, .resx, .csv, .xml and so on. But as seasoned localization pros know, files do become corrupted, and oftentimes there is no way to catch the problem at this stage. If all is well, files go back to the client localization project manager. Developers grab the files and create a resource bundle, English plus all translated languages. Finally, strings are inserted back into the product code.

Automation is everywhere in this age and will only spread further and wider as more and more laborious tasks are computerized. Localization is case in point as its automated life cycle shows. Mojito connects directly to the repository where resource files are stored and strings are instantly selected automatically when they are updated. All strings throughout all products and languages can be displayed within Mojito, where translators do their work. Alternatively, strings can be exported in the form of XLIFFs and sent to translators to be translated offline in their own computer-assisted translation tools. When translation is done, whether through Mojito Workbench or offline, strings are checked automatically for anything that can potentially break the code. Corrupted strings are blocked and displayed through Mojito’s user interface. The Mojito database is the governing TM. Mojito also protects against overwriting strings if they have already been sent for review or approved. All new strings are automatically pushed back to the original repository as they appear.

Mojito’s automated process is sleek and purpose-built to localize. Its advantages are compelling. It is repository-based and not file-based. In other words, it works on entire multilingual projects however large they are and not by individual versions for each and every language. It also allows users to manage strings across all repositories and languages at the same time. This is very important and unique to Mojito. Why is this so great? What does that mean? It means first that the entire client team can follow the progress of localization through the Mojito interface. They can make sure that nothing is released before localization has been completed. Second, an enterprise’s entire product portfolio can be handled in this way. If you want to bulk-change terminology, you can search all strings that you need to change, add a comment and send them for retranslation. This is much easier than scavenging hundreds of files for the right term and changing things manually. If you are a translator and want to translate all new strings across the entire product portfolio, you can easily surface these strings through Workbench with just a few clicks — instead of going around and asking developers whether they have added anything new recently and then trying to find new strings in the files.

A high-level view of Mojito’s workflow for software localization illustrates the various stages of the process:

     Create a repository in Mojito with a list of languages for localization.

     The command line collects strings from the source code and sends them to Mojito ready for translation.

     The localization team can translate online or it can be done offline by sending XLIFF files to translators who can reimport them when their work is done.

     After translation, Mojito’s command-line interface is used to generate localized files and translations.

With these resources, new and updated strings are automatically collected and result in continuous localization in a flexible and easily configurable process. Anything that Box open sources goes through a review process. Mojito was reviewed by a number of engineers to make sure the build that they are open sourcing is stable and ready to go out to the world. Box’s requirement is that any application that they open source has to be stable for at least six months. Mojito is downloadable from GitHub and can be used on any system that supports Java (Mac, Win, Linux and so on). On Mac, Mojito can be installed using Brew.

The future

In an effort to raise awareness of Mojito the Moji team is actively engaging with the language industry and presenting to various venues. They have posted announcements on social media and they are answering questions that come to them through various channels. They are also working with three companies and a nonprofit, Translation Commons, that expressed interest in using Mojito and helping them integrate it in their systems.

There are a few features that would add a lot of value to Mojito, for example, extended translation workbench capabilities, advanced translation memory leveraging, automated terminology checks and translation history and revision control. Team Moji encourages the open source community to contribute to these and other features.

They are fully committed to supporting and expanding Mojito, but they acknowledge that it takes time to build new features well. For now, they will monitor the project to make sure that Mojito is working well for everyone in the localization community, collect feedback and answer questions. So, we are at the beginning of what promises to be a compelling story for some time to come.