MultiTrans Version 4.4, R2 SP1

By Angelika Zerfaß April 12, 2011

I have to confess that MultiTrans has not been on my radar for quite some time and everything seemed to be rather quiet on that front — but then many of MultiCorpora’s customers are large-scale, semi-secret and public-sector institutions with internal translation departments such as government and United Nations agencies. This changed, though, with the acquisition of Flow MMX from Beetext last autumn, as their users have given MultiTrans a greater presence in the vendor world.

Although MultiTrans encompasses workflow, translation memory (TM) and terminology management, this review will concentrate on some specific enhancements to the MultiTrans TextBase TM component. Available for evaluation at the time of this writing are the enhancements that were done to the version 4.4 R2 following the addition of the SP1 service pack.

This new release comprises the input from the user base (enhancement and new feature requests) and also contains a number of usability improvements that will be particularly meaningful to existing users. The areas I will cover in this review are the creation and use of metadata; penalties and bonuses on translation material; the XLIFF Editor; SRX support; and API developments.

Metadata

MultiCorpora has a long history of offering subsegment leveraging, or what TAUS calls “advanced leveraging.” The strengths in subsegment leveraging have been, in part, based on the architecture of MultiTrans, whereby translatable content is extracted from source files into TextBase TMs (corpora of parallel documents) that are more conducive to flexible segmentation views than conventional segment-based TM systems. Because the TextBase TM approach makes it easy for the product to present translators with surrounding context for any given segment or term, MultiCorpora felt for some time that it was unnecessary to provide supportive functionality such as in-context matching. That idea has changed.

The company has now recognized that there are real-world uses for isolation of in-context matches plus association of metadata with individual segments that extend beyond provision of contextual information for translators. Accordingly, the TextBase TM approach has been modified to support this type of functionality. Users will benefit from this new development while also retaining the subsegment matching and full-paragraph matching that distinguished MultiTrans in the past. Some of these advantages include recording change history of translated segments, enhanced searching capabilities and exclusion of certain segments to prevent them from being included in translation wordcounts.

But as with any kind of metadata support in a TM solution, there are several things to consider: what kind of metadata does the product support, what can one do within the product using that metadata, and how can one transfer metadata to and from the product? As MultiTrans exports and imports TMX level 2, this means that not only are translation units (TUs) preserved, but also metadata and formatting information that are associated with the individual translation units.

Another new feature is the possibility of creating a project-specific TMX or text export directly from the Analysis Agent. This means that when analyzing a new source document against the TM, the resulting document-specific matches, plus associated metadata, can be exported directly to TMX so that only the relevant matches are retrieved from the memory.

I have seen this feature in other tools as well, and it is often used to decrease the size of TMs that are sent out to a translator or to be able to control the amount of content from a TM that is given to a vendor or freelancer. On the other hand, extracting only the segments that will provide matches during translation ignores the fact that translators often use concordance searching — searching for terms and phrases in the whole TM if no match for a segment was found. This is something project managers should keep in mind when creating the TM resources for external vendors. In the case of MultiTrans, the limitations implied by project-specific TM are mitigated by the TextBase web module that allows translators to consult a client’s server TextBases simply using their internet browser.

Another important part of metadata preservation is the retention of inline formatting. Previously, the MultiTrans TextBase TM approach was the entire separation of translatable content from anything else, including formatting (Figure 1). A by-product of the addition of metadata has become the ability to retain inline formatting. This represents a major step forward in ease of use during the translation process. MultiTrans SP1 provides metadata of two types: system metadata that is populated by default and custom metadata that is user-defined. System metadata fields include: created by, created date, modified by, modified date, source language and so on. User-defined metadata fields may be generated using a built-in Metadata Editor. This facilitates the creation of fields within MultiTrans TM that match existing fields in files to be exchanged with other systems. It also provides the ability to filter segments within the memory. When beginning pretranslation, for example, a window opens that allows users to choose any segment matching options, penalties and bonuses, and to create metadata filters for the process. Metadata fields are available at the TextBase TM, document, translation unit and segment level, and they can be defined in the form of system fields, pick-lists or text fields.

Penalties and bonuses

This new metadata support also allows the assignment of varied penalties and bonuses to translation units. MultiTrans supports the concept of confirmed or unconfirmed alignments, with confirmed alignments being those translation units that have been reviewed, edited as necessary, and then flagged as linguistically correct. Some users may prefer to give confirmed alignments a small bonus, and in-context matches an even greater bonus in comparison to unconfirmed alignments because the latter are less likely to be acceptable as-is. Segments containing placeables or formatting differences may be assigned a penalty because they are likely candidates for needing extra translator attention.

Because MultiTrans is multilingual and multidirectional by design, matches originating from a TM in which the language directions have been reversed (source and target swapped) may need to be assigned a penalty. The same is true of indirect translations. This is the case when, for example, English source has been translated into French and also German, and now the resulting French and German target segments are used together as a language pair in the TM. Matches generated from the French/German TM may need to be assigned an additional penalty because they may not be quite as linguistically analogous to one another as each was to the original English. Users can also create and manage metadata filters to assign more specific penalties or bonuses based on any metadata values of their choosing (Figure 2). For example, consider two projects, A and B, for which TUs exist in TM. You could create a custom metadata field to record the conditions under which the projects were translated. Let’s assume that Project A was delivered under normal time limits and Project B was delivered in last-minute rush mode. Matches resulting from Project A may be given a higher rating than those resulting from Project B because it is known that Project B was completed under extreme time pressure that constrained the editing and proofreading phases. It may therefore be assumed that matches from Project A may be of greater quality than those from B. In such a case, translation units from Project A may be given a bonus and those from B a penalty. Therefore, matches from Project A will be given preferential treatment, that is, presented at the top of the list of potential matches. Translation units originating from B will not be hidden; they will also appear. They will just appear lower down in the list due to the penalization.

XLIFF Editor

During 2010, MultiCorpora began shipping MultiTrans with a basic XLIFF Editor for working with XML-based file formats. As of SP1, this editor has been substantially enhanced and it is now offered as an editing environment choice when the user has launched the MultiTrans Client. The look-and-feel of the XLIFF Editor is somewhat reminiscent of Office 2007, which may help translators used to that environment feel comfortable when using it. It also provides a side-by-side, TU-by-TU layout that many translators prefer. For those who wish it, though, certain third-party products such as Microsoft Word may still be used as the editing environment in conjunction with the Client and Translation Agent.

The XLIFF Editor operates by extracting translatable content from XML-based file formats such as those in Microsoft Office 2007 or interchange formats like MIF or INX. IDML does not appear to be supported yet. Out of the box, some 17 formats are supported for import and there is a built-in XML mapping utility to create parsing rules that can be used for proprietary XML schemas (Figure 3). Extracted translatable content, plus any relevant metadata, is stored in an XLIFF file while work is in progress, whereby a skeleton file is left behind for subsequent recreation of the original document format. It is not necessary to use the XLIFF Editor itself to translate the XLIFF file; the intermediate file may be edited in any tool that is capable of importing and exporting industry-standard XLIFF. After translation is finished, target language content contained in the resulting bilingual XLIFF file is reunited with the document skeleton through export from the XLIFF Editor and the final file is then usable in the application from which it originated. The XLIFF Editor interacts directly via the Translation Agent with the MultiTrans Client, accessing TextBases and TermBases for segment matches, subsegment matches as well as terminology. The MultiTrans Client can also access publically available memories such as those from the TAUS Data Association or MyMemory, or connect to machine translation engines, if proper permissions are activated.

The XLIFF Editor also provides preview functionality for a selection of the supported document formats. These include files from Word, PowerPoint and Excel, as well as XML in conjunction with an XSL style sheet. The preview of either the source or target document individually, or the two side-by-side may be displayed. The XLIFF Editor appears relatively simplistic at first glance and there are some areas that will probably see further development in the future (for example, I am missing a printable preview, the ability to add formatting tags that are not present in the source — I am thinking of superscript for English numbers here or searching of text inside of tag). But it already has some very useful quality assurance features. In the SP1 release, these features include automatically checking that documents treated within the XLIFF Editor adhere to syntax rules for HTML or XML or, in technical terms, are “well-formed.” There is also a live validation feature, a validation report and a filter to help translators and editors spot violations such as missing or incorrect tags. In the same way validation may even be performed by translators on source files as they are imported into the XLIFF Editor by choosing it as an option during the import process — which I find extremely useful. This can be important for proactively exposing problems that otherwise might not appear until later in the process.

By importing memories from any TM tool in TMX format, translators may use the XLIFF Editor to perform quality assurance checks on them, execute global search and replace operations, and do other similar TM maintenance work. Also, the spellchecking engine has been made more robust with a number of additional languages. Through its modular architecture, additional quality assurance modules can be easily added by MultiCorpora on an as-needed basis to expand the functionality palette as customer needs evolve (Figure 4).

SRX compliance and APIs

MultiCorpora states that its policy is to adhere as strictly as possible to industry standards, and XLIFF and TMX support have been central to its product philosophy for some time. With the release of SP1, MultiCorpora has also implemented SRX 2.0 support for the exchange of segmentation rules. In theory, this enhances interoperability with other systems that support the same standard because the rules for segmentation used to create a TM can be transferred to another tool, thus enabling the receiving tool to adjust to the segmentation of the providing tool and getting better matches from the exchanged TM. But, to my knowledge, SRX is not being widely used at the moment, although there are several tools that support it.

Previous versions of MultiTrans have connected with third-party workflow and content management applications through open, documented application programming interfaces (APIs). The three most commonly used of these support frequently performed tasks that can be usefully automated and called from other systems. These are Analysis, Pretranslation and TextBase Building/Updating. As MultiCorpora has extended support for web-based functionality across its entire product line, the APIs have been rewritten to communicate with third-party applications through web services. The first implementation of the rewritten APIs has been the integration with Flow MMX. I was told that others will follow.

Enhancements

Over and above the XLIFF Editor, the new version has added some usability features that will be more meaningful to readers who now use MultiTrans than those who are learning about the product for the first time. Among these is the consolidation of opening TextBase TMs and TermBases to one “Start Project” dialog, down from a multi-panel wizard, which in and of itself, was a major improvement over the many clicks required for versions prior to that. The new dialog presents TextBase TMs in a tree-style list from which they may be easily selected. Also, all TextBases are treated as if they were Server TextBases for selection purposes. Local TextBases are listed under the “Local” branch of the tree, and server TextBase TMs are under the name of the respective server. TextBase TMs and TermBases that have been selected for access are listed in the top windows of the dialog. After choosing the desired repositories, only one click on the start button is required.

While testing these new dialogs, I noticed, however, that currently they are strongly geared towards mouse users. As a keyboard person myself, I like to go through dialogs and settings by using tab keys, cursor keys and shortcuts — which seems not to have been thoroughly implemented. Often, there is no other way than to use the mouse and sometimes when you go from one dialog to the next, the focus is on the cancel button, so if you use the enter or return key to confirm something, you might accidentally confirm a cancel action. It is a minor point, as the functionality behind the dialogs works well, but I hope that more effort is put into improving this sometime in the future. Other new usability features that existing users will appreciate include a new interface with convenient drag-and-drop support for manual selection of files when building TextBase TMs; separation of the terminology extraction process from the TextBase TM building process, which accelerates the building of TMs; and context-sensitive online help.

Conclusions

By implementing support for exposing in-context matches (100% matching segments that are also preceded and succeeded by 100% matching segments), MultiCorpora has gone a long way in removing one of the clientside objections to adoption. Yet MultiTrans still lacks the option to lock segments from translation. This kind of locking is, of course, philosophically objectionable to many translators who feel that they should at least have the option of reviewing and, if appropriate, improving matches that have been previously rated at 100%. Clients, on the other hand, frequently do not wish to pay for this service. In MultiTrans, the ability to replace 100% matching paragraphs in one step, as opposed to segment by segment, somewhat mitigates the rationale for locking perfect segment matches, but the door has been opened to implementing segment locking in a future release if customers demand it.

MultiCorpora’s XLIFF Editor is a compact, easy-to-use editing environment that does just enough for many translators’ needs. Look forward to seeing increased quality assurance functionality that may serve MultiCorpora well in positioning this editor as the tool of choice for tasks such as post-editing content produced by machine translation. This is especially true because MultiTrans itself supports retrieving MT and publicly shared translation content. Investigation of scalability of the MultiTrans solution exceeds the scope of my capability to test; however, it is worth mentioning that MultiCorpora supports one of the largest self-contained translation entities in the world. The deployment at the Translation Bureau, a branch of the Canadian government, entails over 700 translators who simultaneously access TM through MultiTrans. This is not to imply that MultiTrans is only suitable for enterprise-scale operations; my tests were made using a client license running on a conventional PC. Thus, a MultiTrans solution can range from a single user application through a small-scale, server-based application with four or five workstations, all the way up to very large deployments.