ABBYY SmartCAT

If you didn’t know any better, a cursory examination of the translation technology landscape, particularly in regard to translation environment tools (CAT tools), might easily convey the impression that there’s already plenty to choose from. You might even think that the market is flooded with tools.

After a second, closer look, however, it would become apparent that there is always room for new solutions.

Why? Well, compared to ten or 15 years ago, the market is a lot less homogenous. While SDL’s products still take up a large share of translation technology products, other vendors such as Kilgray and Smartling continue to chew away at SDL’s previously almighty position, opening the market to their own products as well as others.

In addition, the translation exchange format XLIFF has had more of an impact on the concept of exchangeability than other previous standards, making workflows a lot less tool-centric.

Finally, the general notion of faster response times, real-time collaboration through the cloud, and minimal onboarding has taken hold. Cloud-based tools such as Memsource, XTM, Wordbee, and — more recently — MateCat are well established, and tools such as memoQ and Across are including an all-cloud-based approach as well.

Oh, yes, and then there’s the free market where many want a piece of the pie.

In the past few years I’ve consulted for a good number of developers (often former translators) who have hatched ideas on how to garner one of those pieces of the pie. With a small number of exceptions, I ended my consultancy after a meeting or two because I could not see how a) the technology they envisioned was all that different, b) their business plan was sustainable and c) support and development could be maintained in the long run. If even one of those items was in place, I usually continued talking to them.

But when a company such as ABBYY — with more than 1,200 employees worldwide, a proven track record in language technologies (ABBYY FineReader, PDF Transformer, Lingvo, Aligner, Compreno) and past investments by the Russian government — enters the translation environment tool market, it’s time to sit up and take notice.

ABBYY, by the way, is Proto-Tibeto-Burman and means “keen eye.” ABBYY SmartCAT was officially launched a few months ago, though before that it had been in use for some time by ABBYY Language Services, ABBYY’s language service arm, in particular for the ongoing massive volunteer crowdsourcing project of Coursera MOOCs into Russian. As a result of this and other projects, approximately 5,000 active users are presently using the tool.

What is SmartCAT? It’s a completely cloud-based translation environment tool with a wholly browser-based interface. Files are uploaded to a server where they are processed and presented to the translator in the typical tabular translation interface (source on the left, target on the right). ABBYY uses Microsoft Azure servers in Ireland and the United States and its own servers in Moscow; it’s also possible to install SmartCAT on your own servers.

The supported file types presently include Microsoft Office files (of the -x variety as well as the earlier format), OpenOffice/LibreOffice files, PDF, text, Trados TTX, XLIFF (including SDLXLIFF), and a huge variety of graphic formats. Since ABBYY already owned a sophisticated optical character recognition (OCR) solution, it integrated that right into SmartCAT. This means that graphic files can be internally OCRed and turned into Word documents, and PDF files can be read no matter whether they are text- or image-based.

SmartCAT’s solution is most likely the best PDF workaround offered among translation environment tools. Of course, this doesn’t mean that you won’t have to do some amount of formatting to the resulting Word document when finalizing it, but you can expect the quality that you’re already familiar with if you’ve used ABBYY’s stand-alone OCR and/or PDF conversion tools in the past.

The recognition of graphics works fine when regular fonts are used (see Figures 1-3), and not so great when very creative fonts are involved, which is not surprising.

While it’s impressive that PDF files and graphic files can be processed, some other important formats are not yet supported. It’s especially relevant that HTML and XML are missing, and neither InDesign, FrameMaker nor any other software development formats are supported. All of those are in the works (HTML is already being beta-tested), but these omissions show that SmartCAT is not quite where it probably should be for a fully-released tool.

Other things that are missing and are on the roadmap for the next few months include support for the termbase exchange format TBX (the translation exchange format TMX support, on the other hand, is well established), offline processing through XLIFF, good productivity metrics, an application programming interface to embed SmartCAT into corporate environments and a lot of work on vendor management capabilities.

For the so-far relatively bare-bones project management in SmartCAT, users are organized in groups to which different rights are given. Once the project manager sets up and starts a workflow, it becomes largely automated by alerting the different assignees.

Anna Sidorova of ABBYY noted that the interest language service providers (LSPs) have in SmartCAT is “largely due to collaborative translation, machine translation (MT) integration and OCR,” but I imagine that once some of the features mentioned above are indeed implemented (well), that interest should expand beyond those features she lists, particularly since other tools currently offer them as well (aside from OCR).

Freelance translators have different reasons to like SmartCAT. It’s very intuitive, requires a short learning curve, and it’s largely free. In fact it was completely free until recently, when some pricing was introduced for the OCR and MT capabilities. The first 100 pages for the processing of both technologies combined is free for a new user account, and from then on you pay between $.0001 and $.00014 per machine-translated word and $.025 to .035 per OCRed page.

As far as (MT), it’s a different model than most of ABBYY’s competitors, which typically leave the per-character usage fee of engines such as Google Translate, Bing Translator and Yandex Translate up to the individual translator or LSP. ABBYY requires payment of these fees to ABBYY, and then pays the big MT providers in turn. ABBYY claims that it wants to make things as easy as possible for the translator, so the company has inserted itself as a middle man of sorts, but I suspect that ABBYY’s own engine being supported along with the three MT engines mentioned above is part of the reason for that (see the January/February 2014 edition of MultiLingual for an article about the ABBYY MT engine).

The pricing for LSPs is on a project manager basis and ranges from $31 to $37 per license and month, making it very competitive with most of its rivals.

According to Sidorova, much like Across in its early days, the most important market for SmartCAT will be translation buyers with translation departments that will in turn encourage their vendors — LSPs and freelancers alike — to buy the tool as well. I spoke to Sergey Muratov, localization manager for language learning app developer Easy Ten (easyten.ru), who used SmartCAT to localize their tool into ten languages, with ABBYY Language Services as their service provider. He was happy with the performance of the tool, though he did mention the lack of graphics to provide translators with supporting materials such as screenshots — an item that is on the roadmap for SmartCAT as well.

One aspect ABBYY probably needs to emphasize when promoting this tool is the synergy between it and the other products ABBYY offers. The linguistic abilities the company assembled when developing its OCR solutions, which range from the Lingvo dictionary product to morphological recognition in three dozen languages, are all embedded into SmartCAT, making it uniquely valuable beyond its OCR capabilities. While these are features that most directly impact the translator, advanced ways of measuring productivity could expose savings in time and an increase in quality to all stakeholders.

I spoke with Logrus’s Serge Gladkoff who has been following ABBYY’s efforts with SmartCAT, and while he noted that it’s still premature to completely evaluate SmartCAT due to the ongoing development efforts, he had the following to say: “ABBYY is a serious competitor for existing tools providers, merely based on the fact that it is a global software development company that has already managed to become number one in the world in the very competitive field of OCR software — a sector that is not very far from the language industry. In fact, ABBYY is the owner of a whole bunch of very high-tech linguistic technologies. I would be worried if I was SDL. Something to watch, most certainly.”

I’m not sure that ABBYY’s move should worry SDL (I can think of other dangers that should be at least as worrisome to SDL, if not more so), but I agree with Gladkoff that ABBYY’s forays are something to watch, most certainly.