Enterprise Innovators: Turbocharged MT Testing at Cisco

Cisco Systems, Inc., is the worldwide leader in networking for the internet. Founded in 1984 by a group of computer scientists from Stanford University, Cisco currently has annual revenues of around $40 billion and over 70,000 employees. Not surprisingly, Cisco is leading the way in machine translation (MT). Cisco began using MT in 2000 and was the first company to publish highly customized raw machine content on the Web on a large scale. Cisco’s groundbreaking customer support function is managed by a combination of human translation and MT. As program manager of Cisco Services, Pablo Vazquez is the architect of MT solutions for Cisco’s customer support activities and is based in San Francisco, California.

Thicke: It’s no exaggeration to say that you are one of a handful of people working today who knows MT training inside and out. What innovations are you working on within Cisco?

Vazquez: I’m looking at the MT/support side maintenance mix, crowdsourcing and MT and how can that work together, source analysis, how to train an engine for a project, what the best engine is for a project and other issues along those lines.

Thicke: Cisco is recognized as the leader on MT initiatives, starting with its well-known deployment of SYSTRAN back in 2000. Tell me about your job at Cisco as the architect of MT solutions for customer support.

Vazquez: I work with the localized versions of the Technical Support site. With respect to MT in particular, my objective is to continuously improve the output quality. I want to stop for a second here and explain this a bit. It is important to understand that I’m not trying to make MT engines better; I’m not an engineer. I work to improve the output. You can think of my role as that of an expert user of MT since part of my job is to train engines, analyze their output, then determine better ways to train them and analyze their output in the future. At Cisco we have a team that is able to be practical about MT and run a very lean operation, which allows us to deploy new languages and new workflows that save money, improve the translation quality and increase the volume of translation with the same budget.

Thicke: What kind of content do you translate with MT?

Vazquez: I support other operations within the company, but basically MT is for customer support knowledge base content on the Cisco Technical Support website, though there are plans to expand that charter in the future.

Thicke: Cisco has built an extensive customer support site, powered by both human and machine translation, that many enterprises today would like to emulate. What was Cisco’s original motivation for pioneering the automation of customer support translation?

Vazquez: Cisco is a global company with thousands of products and customers in all continents and almost all countries. We therefore need to provide technical support in multiple languages. Our support model is multifaceted, like most IT companies. The one that my group focuses on is classified as “self support,” where our customers would like to solve their problems by themselves using our website instead of calling our technical support engineers. This provides us with an opportunity to bring the support content and knowledge that we gather from other means into the multilingual web environment. The best and most efficient way to translate our support knowledge into multiple languages is through MT.

Thicke: Do you measure how many issues you are able to resolve through the website?

Vazquez: We measure everything. We view the site as a way of providing fast self-support for customers based on the knowledge that we gather from all our technical resources and support efforts. We deliver an efficient, rapid and easy-to-use support model for issues that are more common and well known. At the same time, we can focus our human support resources on more complicated, newer issues. Our rate of call shielding on older repeat problems is significantly above the industry average. Almost 80% of all our customer technical support issues are solved online in a self-service environment. To give you an idea of the traffic, we have more than two million unique visitors per month, for about 18 million page views.

Thicke: Cisco has quite extensive customer support material. How do you prioritize content?

Vazquez: Our attention is focused where our customers’ eyes are. We know which documents no one looks at and which have hundreds of thousands of hits a day. Some documents may have a million hits in a single day, while others have two or three visits per quarter. As you can imagine, we prioritize our work accordingly. We also use other metrics that may predict usage; for example, new products that we expect to be popular or documents that we know will get lots of traffic.

Thicke: What kind of quality metrics do you use?

Vazquez: As I said, we measure everything. We have various quality metrics. A good measurement of quality is how much translation memory (TM) matching you have. We always do TM matching prior to MT, regardless of the system, so we know what percentage of a document has been human translated and what percentage machine translated. Every document is a combination of both. It’s all machine translated, but there may be up to 80% matching or more from the TMs. Also, when we test different parameters, we automatically score the output. Another measure is satisfaction — quality for us is whether we solved the customer’s problem. Every document can be evaluated by the users as well as by the experts. As for expert evaluations, we have a lot of highly trained engineers who speak the target languages in-country, so we have the advantage of being able to tap those resources. We let them read the document and then ask them a series of questions about understandability, the quality of translation and so forth. We don’t let them evaluate single phrases, but rather the whole document as a unit. We are targeting understandability. I think of the analogy of being in a restaurant. What’s most important is that you can communicate what food you want and how spicy and so on and have the waiter bring that food to the table. That’s all you need to communicate and you are satisfied.

Thicke: What languages do you manage with MT?

Vazquez: As a global company, Cisco works with many languages. There are approximately 90 on the website. Our MT team works on what we consider strategic languages: Spanish, Portuguese, Russian, Japanese, Chinese and French, with plans to add more languages in the future.

Thicke: Cisco is known for its longstanding relationship with SYSTRAN. Do you use other engines too?

Vazquez: I’m using everything. We have multiple engines in production. We have instances of SYSTRAN as well as PROMT in production, and in the testing phase we have Moses and a few prototypes that we customized ourselves. But basically those are my main engines today.

Thicke: How do you choose which engine to use?

Vazquez: Cisco bought the first set of licenses of SYSTRAN around 2000, and we’ve been using their engines since then for several languages. Moses is being used more experimentally, in phases, in different prototyping environments. SYSTRAN is used for several of our European and Asian languages; we currently are using PROMT for Russian.

For testing different approaches, Cisco has the advantage of a large enterprise, with a lot of brainpower, assets, TMs and computing power. For example, Cisco’s Quad machines can run multiple virtual computers, so with statistical engines we can test multiple parameters. In four hours of work I can prepare an engine, then clone it ten times and test the same corpus with ten parameters at the same time. We can test all ten parameters over the course of a weekend because we have the infrastructure to scale. We can put the same processing power to work on SYSTRAN, as it takes a lot of work to weed out the parameters that create degradation on the translation. I don’t know of anybody else who is doing this.

Thicke: You say Moses is being used experimentally?

Vazquez: Like everyone in the industry, we are hoping we can combine the forces of the two approaches, rule-based and statistical, like a hybrid, or find a parameter that will allow us to bypass the degradations that occur. For example, with statistical machine translation (SMT), the machine is processing the content statistically rather than analyzing it, which creates many issues, especially with non-translatable sections and text formats. That’s what makes SMT so challenging to deploy. SMT can change the meaning of the document, and that can be treacherous for our content. We have to be careful what we feed into an SMT engine to translate because it tends to unify values and go against uniqueness. A simple example is that SMT would translate error number five as error number two just because it sees more instances of error number two.

There is that particular problem as well as frequent problems around the format, making it difficult to use SMT in an environment like mine. We don’t have a lot of straight text files (UGC is another problem) and we have images, so our content is mixed. We might have, say, console output mixed with text. Moses has a hard time decoding that output, so it changes the values.

Thicke: It sounds like there’s no built-in understanding of the importance of certain concepts, such as a number that shouldn’t be changed. Does this happen often?

Vazquez: Not always, but it happens often enough that we need to review it. And if you need to review every single thing and correct it, you lose the appeal of MT. Say 10,000 documents take five minutes each to correct — that’s 20 weeks of work. It’s impossible to sustain an operation like that. I expect the hybrid approach to work, but the parameters need to be set. That’s one thing we are looking at experimentally.

Thicke: Tell me about your process to improve quality.

Vazquez: Translate, work on the engine, translate and work on the engine again. Our content is improved constantly even after it’s been published. We’re always retranslating it. How often we retranslate depends on the maturity of the language, from every week to every four to six months. If the language is very young — for example, a new product with new terminology — we do this more often, because the only way to catch errors is to review the output. For example, we have a product called Smart Care. Initially, our engine didn’t know it was a product name and translated it as wise caring, a minor yet obvious error that needed to be addressed before production. Another network term was translated as romaine lettuce by the engine. The computer term ping was translated as bullet whistle.  We have a select, highly trained group of people — terminology specialists, lexicographers and MT users — who work on two distinctive processes. First, they identify what the engine is doing wrong, which is done mostly by translating and reviewing. We may use terminology extraction, statistical extractions or document selection, but at the end of the day the first step is reading MT output and flagging the errors. That takes us to the second step: fixing the errors. When an error is identified, it is classified and corrected. Once these two steps are completed, we repeat the process to see if what was fixed in step two passes the quality controls of step one.

Thicke: Do you post-edit your MT?

Vazquez: We do have post-editing in our workflow, but always with an eye on improving MT quality.

Thicke: You mentioned crowdsourcing earlier on. How do you use the power of the crowd to improve your engines?

Vazquez: Like any big company, Cisco is looking at crowdsourcing to create knowledge through working with communities and user groups. The challenge is how to combine crowdsourcing with MT and to see how we can help the community with MT. Several pilots are going on in different places not only to get people to use MT, but also to be able to suggest better translations. My aim is to get regular users to be lexicographers without knowing they are. We work with highly skilled, multilingual technical engineers whose first language is the target, and we encourage them to use the MT function. There are some tools that allow them to see the MT and propose alternative translations. We capture that information and identify the source so we know how reliable it is.  We started with a manual process, but now users can customize their settings. That is working out really well and we are getting a great deal of information to improve the engine from our technical support engineers, who are on our front line of customer support. It’s also a big boost in jump-starting new engines. We know a year in advance that we are going to deploy a new engine — for example, a Thai engine — so we make a basic engine and encourage our Thai-speaking engineers to use it and give their feedback. They benefit from the better MT that they help create and the engine benefits from lexicography work performed by people who are familiar with the terminology.

Thicke:  Looking forward, what’s next for you?

Vazquez: The next generation of engines — engines that will recognize parts of speech and be able to reconstruct the format of the source in the target in cases where the legacy TM does not match the source format but matches the text. This will be a breakthrough. When MT engines do this, they will create a new space that may replace other computer-aided translation tools, since they can do the same job but smarter. That is a day I am looking forward to.