Testing the Limits of Machine Translation
Interview BY Cameron Rasmusson
Name: Derick Fajardo
BIRTH PLACE: Guatemala City
UNIVERSITY: Brigham Young University and University of Phoenix
DEGREES: I have a BA in Translation and Communications and an MBA.
FUN FACT: I learned to speak English when I was 15 by watching Sesame Street.
The optimal process for translating and localizing mountains of copy has yet to be discovered. Project leaders are feeling out procedures and technological platforms to fine-tune the recipe that pinpoints the sweet spot between quality and quantity.
It’s a process of trial and error that Derick Fajardo knows all too well. As the head of machine translation (MT) at Harvard Business Publishing and in his previous roles, he’s worked on optimal solutions for improving translation efficiency while preserving quality. He’s accumulated no shortage of experience since the 1990s, and he recently took the time to share some insights with us.
Let’s start by talking a bit about your background. I know you’ve been involved with machine translation for some time now. What prepared you for the role you now serve at Harvard Business Publishing (HBP)?
So I’ll tell you a little bit about what happened with Harvard Business Publishing. When I joined HBP, our localization efforts were nonexistent.
HBP was looking for someone who had experience building teams, processes, and localization infrastructures. And they wanted to be able to think about the future and how our global reach might scale down the road as well. Our beginnings were pretty humble, and we started out with just one product line.
That helped us understand the intersection between technology and language and business needs and what that should look like. So I joined about 10 years ago with that task in mind, thinking about the future and organizing our efforts, connecting with vendors, and building relationships so we could scale down the road.
And what kind of experience did you have in this line of work that brought you to HBP? You’ve been working on this for quite a while, right?
My previous role was director of international product development at Nuance, and I think that was a plus. One of our biggest successes in that role was the development of Siri in about 40 languages. HBP saw that I had experience with large projects, large teams, and interacting with different business units. It seemed like the right time to transition from Nuance to HBP and bring some of that experience.
I’ve been in localization since the ‘90s, and I’ve seen it start as a rudimentary set of processes and tools and technology and evolve over time. And so I think in 2012 when I joined HBP, we knew this technology had advanced quite a bit for localization. We were aware of localization tools, such as CAT tools, translation memories, and things like that, but there was no need for any large-scale type of technology to manage our translations. We did a lot of the work manually, but understood the need for internationalization, testing, and localization engineering. Over time, we started to build up the teams.
A lot of what we do is through vendors and through contractors. But that gives us the flexibility to scale when we need it. And when things slow down, then we think about our structures. But you know, over time, we’ve been increasing the number of products where we are trying to boost global presence.
I guess this is one way to put it: HBP produces a lot of content. We have three main business units: corporate learning, higher education, and Harvard Business Review, and each one of those produces a somewhat different type of content: case studies, articles, books, podcasts, videos, all kinds of things.
And then, of course, there’s Harvard ManageMentor: our interactive, digital-learning product. And so there’s a lot of variety there, and we have to be cautious about how we expand. We try to understand the demand in each market and for each language.
You have an expertise in machine translation, which you’ve watched develop over three decades. Briefly, could you talk about how the technology has evolved in that time?
I think my initial interaction with machine translation and artificial technology was when I worked for a company called Wildfire Communications. They were a small company, trying to create a text-to-speech technology for telephones. And so we were dealing with the creation of language models. In this case, it was a text model that got converted to voice, or a voice model that got converted to text. Which is very similar to what we do with machine translation, right?
In the case of MT, we are taking one corpus and converting it to an equivalent corpus. So we did some experimentation there, and this was in the early- to mid-2000s, and it was a challenge — a challenge for many reasons. One: The models had to be very large. And we didn’t have access to a lot of that modeling data, meaning equivalent content in Language A and Language B to be compared. And so at the time when the internet was still getting going, there weren’t these language models that exist today.
It was also more difficult in some language combinations than others. So we started with the ones that were a little easier like English to Spanish, or English to Portuguese, and we had some success, but it was very limited to statistical models.
Machine translation has been around for a long time, and that’s the model that we were following. We were able to succeed in short phrase translations, expressions, and commands. But it didn’t take us far at the time.
Fast forward to 2017 or so when neural machine translation came into play, and suddenly we had the ability to take advantage of large neural networks. So the technology limitations and processing limitations were taken away.
Now our phone processors are as capable as many of the large computers. So they’re able to process a lot more data, larger language models, and multiple language models. With neural machine translation, we started doing some research at HBP to see what the industry was saying about it. We’re very cautious, so we wanted to do our own tests and language-pair testing with human professional translators to compare quality. As usual, English to Spanish is one of the language pairs where we have more data and demand. So we started doing experiments there and have been able to see some good success. We are testing to see if there is demand for some direct to consumer MT content.
So one of the big questions for us is the state of MT quality and what is considered good enough, right? We are very careful with our brand. And we’re very careful about what we release with the HBP name. So when we use MT, we have various models, in some cases we use post-editing, and of course we use professional translations as well.
Where do you see the technology going from here? How do you think they’ll develop and evolve over the next several years?
That’s the big question. Researchers and other people with more experience than me are trying to make estimates and guess what’s going to happen in the future. But I think from what I’ve seen based on the past few years and the experiences and successes and failures that we’ve had, we’ve learned that neural machine translation took a big quality leap in the beginning, and now we’re seeing it sort of plateau a little bit. I think it’ll require some large and significant development in research to have another leap like that.
Given the levels of quality we’re seeing today, some claim that MT is as good as human translation. And humans make mistakes. We sometimes are inconsistent. And if machines are anything, it’s consistent. So in our case, I think we will see a lot more use of machine translation in the hybrid model, with various levels of post-editing. Mostly, I feel that there will be an increasing need for translators and editors to adapt to post-editing.
Of course, I do think that there’s still a need for translations from scratch, and that will never go away. There’s marketing and advertisements and literary translations where obviously nothing can replace human creativity. But with neural machine translation, there are advantages with its ability to take large amounts of data for reference.
You already touched on it a bit, but maybe you could speak more about how you see human and machine translation fitting together.
Absolutely. In fact, I have a friend who has been translating for several decades, and she wants to remain relevant. So she wants to learn how she can adapt to stay relevant in the world of machine translation. And that’s what I think: There will be a need for some adaptation editing.
Proofreading is perhaps less interesting to some translators, where they prefer the creativity of creating a new sentence from scratch, and that’s OK. I think there will be room for that. But for those who want to work with machine translation, I think it will require some training. It’ll require the ability to adapt to a new model and add that touch of creativity.
I think it’ll require a little bit of training because LSPs, for example, are very interested in speeding up the translation process. So they’ll put a lot of pressure on translators and proofreaders to get faster at editing and fixing translations. One risk I do see is losing some of the creativity that comes when you create a sentence from scratch. Machine translation can be fluid but oftentimes is not very creative. And so my fear is that proofreaders will become too accustomed to the machine translation style and lose some of their own creativity.
As with any profession, translators are good at staying current. They love to read and absorb language, and I think that will become even more important so they can stay fresh despite all of the contact that they will have with machine translation. Does that make sense?
It does. And I’m curious how those insights apply to what you’re working on now. I understand you’re working on a flagship project at the moment. Can you tell us more about it?
Well, we have three different business units. Our higher education business unit produces a lot of case studies and materials for universities. Our Harvard Business Review division creates the magazine and lots of articles and content on leadership and improving management. And then there’s corporate learning, where we have products that are suited for people at different levels in their corporate careers to become better leaders and managers.
That’s the area where this flagship product belongs. It’s called Harvard ManageMentor, and it’s a large interactive, digital-learning platform containing 41 courses on leadership, management, business acumen, and much more. Each one of those has its own sort of course material. And so they’re very large — they have videos, they have articles, they have assessments, so you can interact and learn through exercises. They have lectures from experts that participate in each topic as well.
So that’s the product where we’re experimenting now with machine translation. It’s very text-heavy content — over a million words in its main product. So we’re trying to see how much we can aid and speed up the translation process and help us get more efficient at translating that entire product, which is very complex, and the technology is very complex as well.
The experiment uses a machine translation engine and integrates it into this product. The different file formats require different connectors, so we’re experimenting with all of that trying to see how much it would help. After we flip the switch and do the translations, we then figure out how good the quality is and how much post-editing it requires. Are there some areas that require more than others? And how much effort will that involve?
We’ve been doing that since late last year, and it’s in progress. We hope to have something ready towards the fall of this year for a pilot program where we will share it with some of our customers. If we can increase efficiency, then perhaps we can provide more languages and make it available to more people around the world.
When’s the next opportunity for the public to learn more about machine translation?
I recently joined the AMTA (the Association of Machine Translation in the Americas) committee. And we are in the process of planning the next conference, which will take place in Orlando from Sept. 12-16, and there will be over 50 presentations on everything related to machine translation. There’s a lot of exciting momentum about machine interpretation, for example, which is kind of a newish technology. These events are also great for networking. Now that we’ve been away from conferences for so long, it’ll be great to see people face to face.
With work that’s evolving this dynamically, I’m sure you need a chance to take a break and unwind. What keeps you busy outside of the job?
So recently one of my hobbies is nature photography, and I’m lucky enough to live in a place where there’s a lot of access to nature and mountains and lakes. I really like to go off-road and explore what’s out there. To really experience what happens when the sun comes up and goes down and how everything around us changes.
Cameron Rasmusson is the editor-in-chief of MultiLingual Media.