Jack Welde: Simplifying the process | MultiLingual February 2023

PROFILE

Jack Welde

Simplifying the process

There’s nothing quite like finding a workaround to a tedious task.

While AI, machine translation, and other advancing technologies have provoked some consternation about the future of professional work, it’s undeniable that it has the potential to eliminate an undesirable chore or two.

Just ask Jack Welde of Smartling. At the most recent LocWorld conference in San Jose, he won the Process Innovation Challenge prize for a new program feature that supplies translators with invaluable context. Better context means better, more accurate translations. How does it work? Well, perhaps it’s better if the man himself explains it directly.

Editor’s note: This interview has been edited for clarity and conciseness.

Congratulations on winning the Process Innovation Challenge at LocWorld in San Jose last year! For those who couldn’t make the event, could you break down the innovation you presented?

Thank you! We really enjoyed the competition, and were thrilled to participate. First, a bit of background on the problem we are solving:

An ongoing challenge for translators is a lack of context. Without context, a translator asked to translate a simple text string like “Home” might not know if this means where someone lives, or is a link to a company’s home page — or any number of other interpretations. Visual context can make this instantly clear. For example, if the word “Home” appears above a shipping address on an ecommerce site, it probably means where someone lives; alternatively, if “Home” appears visually in the top-left corner of a website menu, it probably means the home page.

Admittedly, “Home” vs. “Home” is a trivial example. But without context, similar ambiguity occurs all too often. The problem is particularly challenging for mobile apps, video games, and ecommerce, where text strings tend to be shorter in length and more ambiguous than longer paragraphs. Unfortunately, developers often don’t annotate what a string means and how it will be used in the application. Additionally, translatable strings can appear in seemingly random order in developer source files, making it nearly impossible to infer the meaning or context based on surrounding strings.

Moreover, modern applications are highly dynamic, using custom variables embedded within text strings to create personalized experiences in the live application. A translator might guess that “Welcome, {0}!” is a greeting message personalized with the user’s name (ie: “Welcome, Jack!”). But it would be a quite a stretch to figure out that the string “{0} by {1}” will ultimately be rendered as “Abbey Road by The Beatles” in a music app.

So context is everything in localization. Missing context leads to translator confusion, delaying projects. Or worse, a perfectly valid translation — but completely wrong for the intended context — creates laughable results, or even harm to the brand. With the right context, the translator gets it right the first time.

So if context is so important, why is it missing in so many localization projects? Because context is really hard to produce, especially for highly dynamic experiences like mobile apps, video games, and ecommerce. My online shopping experience has been customized for me, and my video game play is completely different from my friend’s experience with the same game. Obtaining visual context from dynamically-coded applications with an infinite number of potential experiences is so challenging that most people just give up. This is where Smartling’s innovation comes in. If you can record your screen, you can provide visual context for any localization project!

Creating a video screen capture is simple, and the capability is built into every major desktop and mobile operating system. You just record your screen while browsing your ecommerce website, using your mobile app, or playing your video game. Then you upload that screen capture video into the Smartling platform. In a matter of seconds, the Smartling platform extracts every relevant video frame into individual images and automatically matches the project’s translatable strings with the correct video frames, highlighting their precise locations on the video frame. Now, the translator knows exactly what they are translating, fully contextualized.

What are some of the technical aspects of this innovation? What is happening behind the scenes?

Smartling’s innovation was designed to be simple to use, with zero technical experience. But don’t be fooled by the simplicity of the user interface. There’s an awful lot of technology magic happening to make this work. First, we have to break the video into multiple individual frames, which might seem conceptually straightforward. But we want to discard duplicate (or highly similar) frames, and we want to identify the best frames, with the largest number of strings, to create a comprehensive contextual example for the translator. To do this, we take a machine learning approach, combining computer vision and optical character recognition, to match the translatable strings against the extracted video frames. As you might imagine, the text on the video might be in any number of fonts or styles, it might be horizontal or vertical, and it might appear in different places on the screen. We also want to avoid duplicate matches, so a string like “Login” that might appear on every video frame should be matched only once, despite multiple appearances. We need to match complete strings within the video frame, not text that happens to partially appear within a larger text segment on the screen. And remember those examples where the translatable string includes coded variables, like “Welcome, {0}!”? The system needs to automatically match and highlight “Welcome, Jack!,” “Welcome, Cameron!,” or any other similar user greeting, regardless of the user’s name. These are just a few of the technical details behind this innovation.

Among the many excellent ideas presented at the conference, what do you think made yours stand out with the judges?

First, I want to say I thought all the presentations were terrific. We have a lot of smart people in our industry solving many thorny problems, and they are using clever technology to improve overall productivity and efficiency. I was really impressed by their work!

I think the problem Smartling is solving is a perennial challenge for the industry, so it resonated with the LocWorld audience and the judges. Intuitively, everyone understands that the best translator in the world won’t know how to translate the word “Watch” without any context. Is it a button under a video? Or is it a wristwatch category label on an ecommerce site? Who knows? But the problem of getting context to the translator is traditionally so hard that it’s sometimes easier for people to look the other way, even when we might understand the downsides.

Smartling believes context is essential to the translation process. For over a decade, we’ve worked on perfecting visual context — for websites, documents, apps, you name it. And we can show our customers empirically where they have context coverage and where they don’t. We also use real data to show them the positive impact of proper context — on translation quality, turnaround times, and overall costs. But in some cases, it’s still been difficult for customers to provide context to the translator. We think this video context solution is a huge step towards solving this challenge.

As you mentioned in your conference presentation, context is vital for a translator, and they can easily make mistakes when it’s not available. Do you have any examples from earlier in your career that come to mind?

As a translation vendor, you never want to deliver a questionable translation to your customer. But I can remember one (somewhat funny) translation context problem from a decade ago with a Smartling customer. The customer’s website featured “virtual awards” for doing specific activities, and the awards all had clever names. Some of the activities were Apple-related, and the associated award was called “Jobs” (in homage to former Apple CEO, Steve Jobs). But the company also had a careers page for people to learn about working there, which was called… wait for it… “Jobs.” Unfortunately, the careers page content was translated before the award content. Long story short, users who completed Apple-related activities on the Spanish site received an award that said something like “Profesión,” rather than “Jobs.” But we figured it out quickly, and the customer was good humored about it. In fact, it focused both of us on really getting context right to avoid any future issues!

Sometimes the issue isn’t that the translator knows there are multiple potential meanings but isn’t sure which one to choose. Sometimes our human brains just don’t think of the other contexts. As an example, if the text was, “I went into a bank,” most people would naturally assume this means I walked into a financial institution to get some money. But did you consider other possibilities? If I told you the context was a flying story, as told by the pilot, then this means the pilot is turning the airplane. But if this was a story told by a sailor steering his boat, then you know he’s having a very bad day!

With that history in mind, what inspired the idea for this innovation? When did work on the visual context extraction project first begin?

This project started with a video game customer. Gaming is an important vertical for Smartling, and we help some of the world’s largest gaming companies to properly localize their products and create an amazing experience for their gamers. But video games don’t look anything like typical websites or mobile apps, the on-screen text is often sparse, and the game play is truly unique for every player — and every game! So visual context is a real challenge.

We learned that this company does a lot of testing of new game features on virtualized servers, and has both human and automated testing. One day the customer sent over a 15-minute video screen recording of an unreleased game, generated by its automated testing process, to show us the specific context for a tricky translation. But no translator has time to watch a 15-minute video, to find the specific strings in the game video, and compare them to the work he or she is translating. So we had to find a better way.

At first, the answer was for Smartling translation project managers to find the text in the video, screenshot the video frame, and highlight the text by hand. Obviously this was unsustainable. Could all of those steps above be automated? Could we use the test videos the gaming company was generating? Time for a hackathon!

From time to time, Smartling holds internal “hackathons.” Our company roots are in technology, and hackathons are regular events in many big technology companies. Over several days, teams of developers, product managers — or anyone with a good idea — can conceive, design, build, and demonstrate their innovation. It’s a great way to encourage creativity and collaboration throughout the company, and it can be very exciting — and competitive. But it’s really about doing something clever and innovative, to get us out of our comfort zone and create a completely new idea in a matter of days. Smartling’s video context innovation was born out of a hackathon, with a brand new approach to solving a customer challenge.

Naturally, the work that comes out of hackathon is not production-ready. The code is practically duct-taped together, with little regard for performance, minimal optimization, and lots of bugs. But it’s the idea and the rapid execution that makes hackathons so great, and we can pick winners to continue to development them into product-ready solutions.

How difficult was it to develop a tool with this kind of flexibility? How did work evolve over the course of development?

One of Smartling’s core values is, “Good today, perfect tomorrow,” which is a shortcut way to say that we should continually conceive of new innovations, build them rapidly, and make them good enough for customers to try out. Then, based on market feedback, we iterate rapidly and continuously improve on all our prior efforts.

We showed our gaming customer an early version, to get feedback. They loved it, so we began to share it with others, while we continued to build it out, including a focus on performance, security, and an intuitive user interface.

You mentioned in your presentation at LocWorld this has resulted in a 13,200% increase in context coverage. Did that surpass your expectations at the outset of development?

I was blown away by the results. Context has always been important to Smartling, so much so that we display context data for every project in the Smartling TMS dashboard. A localization manager can log in and immediately see the exact percentage of strings with context versus without. It’s a front-and-center reminder of the importance of context.

After launching this feature, and making it available to our Enterprise customers, we were able to see significant (and growing) improvements in the overall context coverage in the platform. It’s clear that context has been a pain point for all companies, and this feature appears to be a huge improvement.

This is one example of machine learning eliminating some major headaches for translators. What are some of the other applications of the technology that excite you the most?

It’s worth mentioning that context also helps to improve machine translations, not just human translations. That might not be intuitive, so allow me to explain. When we talk about visual context, we typically mean the translator can see the text within the visual layout of the website or app and can comprehend the meaning of the text based on that visual understanding. But machine translation is also impacted by context. Recent breakthroughs in neural machine translation include context-awareness at the document level, such that the machine translations are contextually influenced by the surrounding text on the page. So context is more than just a visual aid for the human translator.

As another example, in-context review is still a very important part of a typical translation workflow. Whether it’s professional translation, MTPE, or trained MT that tackles the first step in the workflow process, a review step often follows to validate the translations. The best way to do a review is in-context, to see exactly how the translated text will appear in the final product. But without context, an in-context review is simply not possible.

At Smartling, we are advocates of “expert review,” in which the subject matter experts conduct the final review, in their native languages, within a beautiful visual context experience. We even see a world where monolingual experts within the Enterprise are participating in in-context reviews — not to validate specific translations from a source language into a target language, rather to validate the final product in their native (target) language, in context, just as it will eventually be viewed by the target audience. The only way this happens is if context is available, and Smartling’s video context innovation is a huge step forward towards that vision.

In so many ways, we are only seeing the beginnings of how these technologies will impact and transform work and society. What kind of a future do you envision 10 or so years from now?

It’s really hard to predict what the world will look like in 10 years. I’m reminded of the Bill Gates quote: “We always overestimate the change that will occur in the next two years and underestimate the change that will occur in the next 10.” I believe this.

Arguably, Smartling was the first translation platform to embrace “the cloud” a decade ago. We saw a future in which the world’s music, videos, books, content, calendars, communications — and everything else — moved to the cloud. We knew a cloud-based translation platform offered many potential benefits, including better collaboration, real-time visibility, better resource allocation, faster time to market, and lower overall costs. Unsurprisingly, just about every translation platform is cloud-based today, a decade later. But we probably overestimated the willingness of the industry to wholeheartedly adopt the cloud, and we underestimated how cloud-based compute platforms and storage would ultimately lead to neural machine translation and step-function improvements in the overall quality of MT.

As another example, within a decade of the launch of the iPhone, practically every adult in the world was walking around with a computer in their pocket. This development created major new innovations — for better or worse — in commerce, communications, entertainment, travel, navigation, socializing, payments, and a million other things. At the time of the iPhone launch, I don’t think most people would have predicted the unimaginable impact the iPhone would have on the entire world in just 10 years.

It’s worth remembering that there were only 36 million people on the internet in 1996 (I was one of them), and today there are 5 billion users. Based on that fact, the investor, Paul Graham, recently asked the thought-provoking question, “What do 36 million people use now that eventually 5 billion will?” A lot of people gave answers that I think probably fall victim to the “two year” fallacy from Bill Gates’ quote, overestimating the impact that shiny new things today will have on the world in the next two years. As such, if I was so bold as to try to predict what the world looks like in 10 years, I would try to think about what a fairly small number of people use regularly or continuously in their lives today but will be used by practically everyone in 10, 20, or 30 years.

My guess: artificial intelligence.

In the same way that the internet has had a undeniable impact on the world in the 27 years since 1996, and the smartphone has had a tremendous impact on the world in a decade and a half, I think artificial intelligence will have the biggest impact on the world in the next 10 years.

What does that mean? It means that just like the internet has transformed everything we do, AI will do the same. Yes, I still read books and watch movies and listen to music, but they are all delivered to me over the internet, and I consume them via an internet-enabled device. In 10 years, AI will have touched practically everything we do how we drive cars, how we grow our food, how we allocate resources, how we entertain ourselves, how we practice medicine, how we conduct warfare, how we shop, how we socialize — how we live. It will touch everything in our lives, just as the internet and our smartphone have done so. AI will likely just be something that is ever-present, just below the surface, making continuous small decisions and (hopefully) helping to guide and improve our day-to-day lives.

We’re seeing some of this already. Just about anyone who has tried out ChatGPT since its launch a few weeks ago has been blown away by the experience. Ask a few questions in a familiar chat-like interface, and ChatGPT dutifully provides robust, comprehensive answers, customized to your specific and unique questions. It’s incredibly impressive, and companies as big as Microsoft are scrambling to incorporate this amazing technology into their products. And yet, ChatGPT is far from perfect. It can be biased, it can be tricked, and it can be flat-out wrong on occasion. (But I also know a lot of humans who can be biased, can be tricked, and are often wrong…)

I’m not smart enough to anticipate all the ways that we will incorporate AI into our daily lives — and certainly there will be both good and bad things that come with it — but I’m confident AI is not a passing fad. I believe it has the same potential impact as the internet has had on the world, and in the next 10-20 years it will be used by 5 billion people, every single day. Likely we will overestimate the impact of AI in the next two years, and be utterly amazed by the impact of AI on the world a decade from now.

And while it won’t be perfect, I nevertheless remain optimistic for the future.

Is there anything else you want to mention that we haven’t covered?

I want to thank the wonderful Smartling team. The folks that conceived and built our video context innovation — and all the other incredible innovations in the Smartling platform — are extraordinary. I never cease to be amazed by the incredibly bright, talented, hard-working people we have on the Smartling team. They believe in our mission, and they are focused on helping our customers reach their goals. I couldn’t be more proud of them.

Back to Issue

Language Technology in the Year 2022

By Arthur Wetzel

The Nimdzi Language Technology Atlas 2022 comprehensively assesses the language-technology landscape, as it does every year. It is a comprehensive, free resource — which you…

→ Continue Reading

Artificial Intelligence, Perspectives

AI Singularity and us Humans

By Rodrigo Espinosa

With the emergence of new AI tools and solutions practically every week, the concept of singularity in artificial intelligence (AI) comes once more to the…

→ Continue Reading

Column, Culture

l9Ng69g1: The secret to encryption

By Ewandro Magalhães

Are you a code talker? If you’re not, that might change after reading Ewandro Magalhães’ most recent piece for MultiLingual. Here, KUDO’s chief language officer…

→ Continue Reading

WEEKLY DIGEST

Subscribe to stay updated between magazine issues.

MultiLingual Media LLC

Simplifying the process

Language Technology in the Year 2022

AI Singularity and us Humans

l9Ng69g1: The secret to encryption

Weekly Newsletter, Subscribe to stay updated!

Login or Register