memoQ 5.0

By Angelika Zerfa� May 18, 2012

Complex ideas implemented in an easy-to-use manner

With version 5.0 of Kilgray’s translation memory system memoQ, we now have access to some nifty new features that offer a lot of productivity gain, especially in project management. While the software was still in its beta phase and again since its release, I tested its track changes features, terminology extraction, file-based pre-translation, file imports and more. Although most of these features are useful for project managers and translators alike (the availability of some features depends on the type of license you have), I would say that they mostly add efficiency to the project management side of things.

Track changes

Tracking changes is essential during translation, but changes also happen before or after translation, and these can be captured now with the history for translation documents. Changes in the source documents become visible through the matching and word count statistics, but sometimes it is better to see the text changes in a more visual way. The same goes for the review process, which often happens outside of the translation tool environment, meaning changes have to be transferred back to the bilingual documents manually. With version 5.0 there are now different areas where the history of a file can be viewed.

You can see changes in source language documents, or can view changes during translation, proofreading or editing with track changes. You can see changes between different versions of one document, even if it has been changed outside of memoQ (using the Word table export, for example), or changes of one segment (row history).

Users with a project management license can compare different versions of the source file and can create an HTML view of those changes. For this, you import the first version of the file and then re-import the second version, changing the version number of the file from 1 to 2. Then the version history function lets you compare the first with the second version of the file. The differences are shown in an HTML table. In Figure 1, you can see the changes between two source files, only available with a project management license. Red rows are not present in the second version of the file. Blue rows are new or changed rows in the second version of the file.

During translation, translators or reviewers can switch on track changes and see any changes they make in red, or red strike through text depending on if they are adding or overwriting previous text. To visualize changes implemented outside of memoQ, you can compare different states of the same file. These states are either saved by exporting the file to a bilingual format for processing outside of memoQ, or by manually creating a snapshot of the file. Up to now, you could export the file as two-column RTF, send it for review and import the changed file back, where the changes from the import file would overwrite the translation file in the project. Now, after updating your project with the changes from the review, you can compare the file before and after the import of the changes and see what changes have actually been made. In history and reports, there is a list of snapshots of a translation file, and you can choose two versions and compare them. Snapshots are created automatically with some commands, such as export to two-column RTF, but can also be created manually. This is also available with a translator license. In Figure 2, track changes are visible in the translation editor. The status of every changed segment is set back from confirmed (green) to edited (orange). In addition to the changes between two document versions, there is also the possibility to view the changes in a single row.

Be aware, however, that you will need to activate the history function when creating a project to use this feature, since it is not a default setting for new projects. You can also activate the history function at a later stage, but the history will only be available for files that are added to the project afterward, not for files already in the project, even if you re-import them.

Terminology extraction

Terminology can be the backbone of translation, so creating terminology lists or databases should not happen during or after translation, but rather before. When documents are created, the authors should be aware that anything that is product-specific or company-specific, such as any abbreviations and lists of product names, should be collected for use with terminology databases during translation. memoQ now offers a statistical terminology extraction component that lets you extract monolingual lists based on the frequency of the terms from translation documents, translation memories (TMs) and LiveDocs content. From the latter two, the list can be manually filled with target language equivalents from the bilingual material. Any term or term pairs that are accepted can be transferred to the termbase that is attached to the project directly. In Figure 3, there is a list of term candidates. The lower left-hand window shows the segments from the TM that was used for extraction, and the lower right-hand windows show terms found in the termbase. Termbases can be used to mark terminology (with the blue background, like in the editor) that already exists. Terms can be sent with or without a translation to the termbase once they have been accepted.

I have to admit that I was a little disappointed in the beginning because I had expected a real bilingual extraction. On the other hand, from my own experience in providing term extractions for my clients, I know that checking a bilingual extraction takes as much time as selecting the translation term from the concordance hits, so this type of extraction might be better than having the tool try to find the translation for you. What I am definitely missing so far is a way of sending context sentences to the termbase. Also, at the moment, whenever I send terms from the list to the termbase, all accepted terms are saved in the termbase, regardless if they already exist or not. This creates a lot of duplicate entries if you are not careful. After having done more extractions we found that a manual extraction is a good way to start a terminology project, but as soon as you have a certain number of terms in your termbase, it becomes more effective to use an automatic extraction for collecting further terms.

Filters for mixed

file formats and text

When importing a file into a project, the tool recognizes the file format and extracts the text accordingly. Unfortunately, many well-meaning clients try to make it “easy” for translators and a Word or Excel file gets filled with, for example, HTML-like content. In this case the tools usually have a problem marking up the text correctly. Some examples are Excel files that contain HTML-like strings (often exported from software products); XML files that contain text with HTML tags (often used in learning management systems), where the XML file is just the container; and content within files that should not be touched during translation but that should be treated like a tag.

However, if imported with the default filter, all the HTML codes will appear as regular text, and will also be counted as words in the statistics. An additional HTML filter can create the correct view inside the Excel file. This is one of the most useful features for my daily work, as we often have to prepare similar files so that they can run through the translation process without being damaged. Up to now we had to use workarounds to get results like this. This cascading filter functionality should not be misunderstood — it can deal with text inside a file that belongs to a different file format, but not with embedded files. So don’t be surprised when importing a Word file with an embedded PowerPoint file inside that the cascading filter does not offer you PowerPoint as a second filter option. Embedded files cannot be extracted.

Another filter has been added that comes in especially handy for text files where the client specifies that “only the text between quote marks” or similar delimiters should be translated. Up to now that either meant copying the file to Word, applying a style to everything that should not be touched, and making memoQ ignore text with this style during import. Now, there is the Regex text filter that can be used to define what text from the file should be extracted. As the name implies, you will have to use regular expressions to define the text to import (or not import). Select the Regex text filter, select the option to import only selected text, add your regular expression and check the preview page to see if the regular expression is correct (Figure 4).

Automation and beyond

By using the application programming interface (API) it has always been possible to add your own automations to memoQ, but now one of these comes with the new version directly. After installing the content connector and activating the license, you can set up the connection to a folder. When you create that connection, you are actually setting up a new kind of project with file formats to import, the export path and any filters for the files you might need. Now you create a new project in memoQ and connect to the content source. The files within the folder you specified can be imported and the folder can be polled again and again for any changed or new files. But note that with this kind of project, you will have to add all files via this content connector folder. Adding files in the usual way is not possible, as the links for adding documents are not available. This feature was officially implemented in version 5.0.21.

memoQWebTrans is another component that moves the worlds of translation and review closer together. It is a separate component that allows for use of a browser to translate or review translation. The web-based interface looks a lot like the memoQ editor, but it is limited in functionality and also does not allow you to change any project settings (Figure 5).

The development team at Kilgray always amazes me by implementing quite complex ideas in an easy-to-use manner. Tracking changes is one of the things that people had been impatiently waiting for and I can see many uses for this feature. But still, you will have to take some time to get to know all possibilities, when to create snapshots, when they are created automatically, which versions to compare and so on. The cascading filters and the Regex text filter, which came with version 5.0.53, where you can specify what parts of a file should be extracted (very good for PHP files) are my favorites — mainly because these are the points where I had to come up with lots of workarounds, macros and other inventive stuff to protect text or extract text. In the cascading filters there is a Regex tagger as well that lets you mark up elements in a file as tags. However, using Regex will not appeal to everyone as you do need to more or less think like a programmer, but it is a great help to achieve things that are not there by default.

Terminology extraction is also an excellent feature, but I do think that there is still room for further development to make it more efficient to use, such as not creating duplicates when saving candidates to a termbase or adding context sentences. The content connector was easy enough to set up for polling a folder on my local machine. When setting up a content provider folder with a service project, you can even have the system check the folder every certain number of minutes — every ten minutes, for example.

WebTrans will offer a great way to share documents with nontranslators, such as in-country sales staff who are asked to proofread a file. Now they can see the same thing the translators see and the changes can be discussed right inside the file itself and not via annotated PDF or Word files. On the other hand, the WebTrans module is, in my view, not a full substitute for the memoQ translation environment. Depending on the internet connection, it might take some time for the screen to refresh after each change, and the user does not have the full range of settings available.