Enterprise Innovators: Expanding reach through crowdsourcing

If Facebook were a country, it would be the third largest, behind China and India. The social networking site, which announced its one billionth user in October, leads the social media movement.

Ghassan Haddad, director of internationalization for five years at Facebook, was himself at the forefront of another great movement, translation crowdsourcing through engaging user communities. Prior to joining Facebook, Ghassan was director of software engineering and localization at PayPal. With a PhD in linguistics from the University of Illinois, Ghassan has over 20 years of experience in language research and technology, management and software development. He was interviewed from his office in Palo Alto, California, during his last week at Facebook.

Thicke: Ghassan, you’ve decided to take a well-deserved break after five years as director of internationalization at Facebook. This seems like a good time to reflect on your contribution to both Facebook and to the localization industry in general. What inspired you to join Facebook in the first place?  

Haddad: Five years ago, during my interview with the company’s 23-year-old CTO, he indicated that under no circumstances did he want internationalization/localization to delay product development. The challenge was tough but exciting because it was the trigger to open the door for innovation and that is why I actually decided to join Facebook. Not only are they willing to accommodate certain things like going out of the box, they actually encourage it. The overall culture encourages innovation.

Thicke: Tell me more about this culture of innovation at Facebook.

Haddad: At Facebook there are two statements you hear a lot. One is that at Facebook the work is never done — we are just starting. You could go to the bathroom and there would be a sign saying, “This toilet is 1% done.” Another sign says, “Go ahead and break things.” We still have that sign. It emphasizes to everyone in the company that they shouldn’t be afraid to break things, that they shouldn’t ever not try things because they are afraid to fail. Even in translation, there were several occasions where we did break things that we fixed, adjusted our environment to prevent further recurrence, and moved on. At other companies where I worked, such occurrences were treated as emergencies, and the focus was on managing up, down and sideways mainly to satisfy people that action was being taken rather than focusing on the action itself.

Thicke: As director of internationalization at Facebook, you were the first to implement large-scale crowdsourcing.

Haddad: We didn’t invent crowdsourcing. It existed in different formats. Google had even tried its hand at it then gave it up. We were the first people who validated the crowdsourcing model on a massive scale. In social media, you have the audience. Couple that with the will to make it work and you have a winning formula.

Thicke: Why did it work for Facebook?

Haddad: Facebook had a community. At the same time, it had a very strong technical culture, and still does. You need both. When you talk to people, they are either on the technical side or the linguistic/project management side. Some companies are driven completely by engineers who believe the world could be programmed in a certain way. We now know that when it comes to languages and to engaging people, it’s not as simple as that. I told the engineers from the very beginning that this was not strictly an engineering project. I don’t believe crowdsourcing can function in a simple algorithmic way. By simply engaging a large number of people, you won’t get the result. A few times during my first year at Facebook, I had the temptation to kill crowdsourcing, but I kept reminding myself that a big part of my mission was to find ways to make it work, rather than reasons to abandon the model. It did not survive on its own. It did because of the combination of community, technology, engagement, a different vendor relationship model and the willingness to try crazy things. There is a psychological aspect to it, as well. It’s about engaging your users. When you put it out, it should be voluntary and it should be fun, rewarding and engaging. Most of the comments from the community said this is one of the most fun things to do on Facebook.

Thicke: How did you make it fun?

Haddad: I would say it was both fun and rewarding. Translation is itself a fun activity, as we heard from our audience. We make it easy, we make it rewarding. Inline translation and the ability to have your translation be visible to everyone in the world instantaneously are part of what made it fun and rewarding. I remember hiring a German linguistic contractor in the early days to check the quality of community translations and fix obvious errors. She called me one day to tell me that she couldn’t believe that I was paying her to do this work!

Thicke: Dotsub, which manages the crowdsourced subtitling of the TED Talks, uses a variety of techniques to engage translators, such as recognizing contributions. How do you reward your translators?

Haddad: Financial rewards are a no-no. Instead, we give translators classifications and a leader board that shows the top translators. We also established Facebook groups for them to interact, where discussions can actually go beyond just translations, and we held numerous face-to-face engagements where we invited some people to coffee or dinner, or gave simple gifts.

Thicke: Why are financial rewards in crowdsourcing a no-no? Do you agree with Dan Pink’s finding that financial incentives can actually be a disincentive?

Haddad: Giving financial rewards is like opening a Pandora’s Box. For one thing, I believe that they can add unnecessary and unneeded legal complexity to the process. Additionally, you will naturally not pay every contributor, so how do you decide who to pay and who to exclude, and are you willing to put the time and effort to engage in discussions and justifications for making these decisions when dealing with hundreds of thousands of people? Remember that in many cases, people ask us to add their languages to the mix and some form groups to convince us that if we do, they would jump in and do the translation.

Thicke: What languages were done using crowdsourcing?

Haddad: Every language was initially done with crowdsourcing. Back in 2008, only three were not. So far 76 languages have been released, and around 30 are still in progress. When there is a passionate community, like with Welsh, Catalan and Basque, the community jumps in and translates very quickly.

Thicke: That’s impressive. Is the translation work finished?

Haddad: There isn’t a single language at Facebook that is “complete.” We have a weekly push, but the UI changes happen daily. There are currently about 120,000 strings (five years ago, it was 20,000), and we modify a couple of thousand each week. If you don’t have a robust technology to manage that, or a highly engaged community, there is no way you can catch up with the changes on our site.

Thicke: With so much content to translate, how do you decide where to start?

Haddad: I asked the engineers how to track the traffic so I could decide what to prioritize. We now have metrics on which pages are most visible and which strings appear on the most visited pages. I can see which of the 120,000 were the most visible in the last three days. But it changes all the time. 20,000 to 30,000 will have captured most of what people see. If you think about it, 99% of what people see is about 10% of what’s there, and that’s true about almost every web or software application. If we are engaging professionals, we can send them the most important stuff to work on.

Thicke: So how do you find community members who might want to translate?

Haddad: When we started back in 2008, we showed a “rooster” story to 20,000 people at random who live in the country, and said, for example, “Help us make Facebook available in France: click here to continue.” Today, translation is open to everyone, and the only time we do an invitation is when we notice that the engagement has slowed down and we need help to move things faster. One of the most recent languages we released was Khmer and the users were the ones who found us — they asked us to include their language, rallied their friends, and the translation was released a month later because of their activity.

Thicke: It seems nowadays that everything around crowdsourcing is rosy. Is there a downside to it?

Haddad: When you engage a random number of people, your ultimate responsibility is to protect your brand. A lot can go wrong. You are exposing yourself to a lot of risk.

Thicke: So how do you manage crowdsourcing without damaging the Facebook brand with poor translations, even sabotage?

Haddad:We give people different levels of authority. There is the average user, someone who just started translating. Then there is the user who has contributed a lot and has had others voting positively or even negatively on their translations. We programmatically evaluate which ones are most reliable and they become “whitelisted” users. As soon as whitelisted users contribute a translation it goes on the site live. The third category is professionals. They translate confidential material and review contributed translations. Guidelines for the review are that unless the translation is wrong, don’t change it. Nowadays it’s a little looser, but at the beginning the instructions were to just fix mechanical or glossary issues — we have a limited number of words in the glossary, only those needed for consistency. If they wanted to change anything else, they had to justify it to me.

Thicke: That’s interesting that you don’t like the professionals to change much of what the crowd has done. Why is that?

Haddad: Translation quality is not a black-and-white issue. I’ve been working in this field for over 20 years and in my experience the only feedback you get typically comes from one internal employee, usually a marketing person. Then the person leaves and all of a sudden everything that was said becomes wrong and the next person has his or her own ideas. The great thing is that when I have 150 positive votes on a translation, why would I want to change that to something someone else decides is a better translation? One of the most interesting experiments to me was performed in the early days. I took a sample of Chinese translations voted on and approved by the community. I put the ones already approved by the community in a table with translations modified by professionals. Then I asked people to pick which they liked best, and to explain why. In 45% of the cases, the community translations were judged as better or equivalent; in 55% of the cases the professionals’ translations were better, but not significantly: either could have been selected. In another study, I asked professionals to score community translations in several languages with an A, B or F for fail. A full 93% were either A or B. In addition, we’ve run quality surveys on about 30 languages over several phases and received anywhere between 600 and more than 6,000 responses from our users. We continue to do so, and in almost all cases, the satisfaction rate among users is very high and that’s really what matters most to me.

Thicke: So that proved to you the validity of translations that come directly from the community using them.

Haddad: Yes. And the results also showed us how to tweak the program. Looking at the fails, we figured out what went wrong, so we modified the glossaries or programmed automatic feedback to the user. To give greater weight to trusted translators, one of the tweaks we did was to adjust the scoring based on the different levels of user trust. For example, if a positive vote by an average contributor adds one point to the score, that of a whitelisted user adds three or four or whatever.

Thicke: But you do use professional translators at times. What is your experience engaging with translation vendors?

Haddad:We’ve had a fairly positive experience with our vendors, but much can be improved. Many people ask which vendor is best. That’s not the right question. All have capabilities. Just as important as vendor selection is the method by which you work with your vendors. The most unfortunate things about most vendors is that they want to be treated like partners but don’t act like it. To be a partner, you need to be part of the solution. I can’t tell you how many times I’ve approached vendors to go beyond the basic translation services, but they don’t.

Thicke: Some people believe that crowdsourcing is about not paying for translations so you can save money. Is this true?

Haddad: You can save money and effort with crowdsourcing. If you have an engaged community and good support mechanism, you can achieve significant savings. More importantly, though, crowdsourcing allows you to expand your reach beyond the standard language; it is the main reason why translations into many languages happen at all.

Thicke: Could the costs of managing crowdsourcing actually run higher than paying a vendor for their translations?

Haddad: Although this can occasionally be the case, that’s generally not true for Facebook.

Thicke: There’s a lot of interest in crowdsourcing right now. A number of new players in the market are offering platforms to manage translation crowdsourcing. Do you think some sites could have a problem getting the critical mass needed to crowdsource?

Haddad: I am sure that this can be a problem in some cases. However, even though we’ve had hundreds of thousands of contributors, most of the translations are typically done by a handful of people in each language. The trick is not in getting hundreds to translate, but in ensuring that when the top contributors move on, you find ways to fill the void.

Thicke: What advice would you give an organization wanting to set up translation crowdsourcing?

Haddad:Crowdsourcing will require an initial investment, especially in technology. I released 35 languages during my first year at Facebook, and had only 1.5 people working with me on the nonengineering side, so you don’t need dozens of people to make it work. Be open to questioning everything you know including notions of quality, and, most importantly, transform your mistakes into opportunities.

Thicke: Ghassan, I appreciate you taking time out to talk to me during your last week at Facebook. What’s next for you?

Haddad: First of all, I’m going to take a break and spend time with my family.

Thicke: If you had stayed at Facebook, what would your next project have been?

Haddad: My next focus would have been machine translation. It’s not feasible to translate everything that needs translating, for example to translate for all the developers. We’ve partnered today with Bing. It’s a no-brainer. I had to work hard to convince people that it was a good idea, that we should allocate engineering resources to do it, but in the end it was implemented. The Microsoft team helped a lot.

Thicke: What advice would you give someone wanting to follow in your footsteps?

Haddad: Don’t be too comfortable; don’t be afraid to experiment; think differently. Be more tactical.