TBX Version 3 published at ISO
TermBase eXchange is a family of XML-based terminology markup languages that allows for lossless exchange of terminology-related data and metadata. So far, two more lightweight versions known as TBX-Basic (published in 2008 by LISA Terminology Special Interest Group) and TBX-Min (published in 2013 by LTAC Global) have been developed. TBX-Min targets the use case of exchanging terminology with translators in the form of simple (mappable onto UTX) glossaries. However, TBX-Basic is more suitable for mapping between TBX and XLIFF 2.0.
In Version 3, Min and Basic have been recast as modules building on TBX Core that is captured within ISO 30042:2019 along with the extensibility mechanism that allows for validating these and other dialects.
Localization veterans will remember that TBX 2.0 was the last TBX version published by LISA in 2008. TBX 2.0 was the first edition co-published with ISO TC 37/SC 3 as ISO 30042:2008. ISO 30042:2019 has replaced the first edition, so don’t be confused that the third major version of TBX is also called the second edition of ISO 30042.
What does it mean for a localization professional that TBX has a major new version after almost 11 years, published by ISO on April 4th?
LTAC had released industry oriented dialects compliant with ISO 30042 even before the final publication as the standard couldn’t have undergone any technical changes since August 2018. These dialects are available through GitHub.io webpages. Notably TBX-Basic Dialect Version 1.0 at https://ltac-global.github.io/TBX-Basic_dialect. LTAC also published a robust developer guide to the public dialects compliant with the new modular TBX www.tbxinfo.net/tbx-downloads.
Although the ISO DIS 30042 ballot held in 2018 didn’t fail per se, the draft international standard received a large number of substantial comments that lead to technical changes. The final draft international standard ballot launched in November 2018 and ended in late January 2019 and finally yielded the published international standard on April 4, 2019.
The most important data model change in ISO 30042:2019 is making the inline data model compliant with the XLIFF 2 inline data model. This was implemented by the LTAC/TerminOrgs TBX Steering Committee and TC 37/SC 3 with input from XLIFF and XLIFF OMOS TCs. The modular public dialects (Min, Basic and Linguist) compatible with the second edition of ISO 30042 were already released, also with the new inline data model.
The second ISO edition of TBX accumulated quite a number of changes. But most of those changes deal with modernization of the XML tooling. Apart from the inline data model change and related introduction of explicit directionality support, it is important to stress that TBX joined other modern day standards in leaving its former monolithic design for a new modular design.
The second ISO edition of TBX specifies a nonnegotiable core and prescribes that all compliant dialects must include that core. The standard then specifies a modularity mechanism. Outside of ISO, LTAC made sure that the most common dialects are extended from the core using a telescoping principle. The simplest possible dialect is TBX-Core, the TBX-Min is composed from the Core and Min modules. TBX-Basic is TBX-Min plus the Basic module, and TBX-Linguist is TBX-Basic plus the Linguist module. The stakeholders have high hopes that this will vastly improve the so-called “blind” or plug and play interoperability. The second ISO TBX edition also defines TBX agents and provides a compliance clause targeting the document compliance as well as specific agents’ compliance. Notably, this edition introduces namespace based modules, albeit only in the DCT (data category as tag name) style, which is a compromise that should not disturb those who are not worried with the DCA (data category as attribute name) style. And vice versa: implementers who are afraid of implementing namespaces can stick to the old DCA style. The good thing is that both DCA and DCT are extended from the same core and modules specified in either DCA or DCT are semantically equivalent (based on the same sets of additional data categories). DCT is easier to validate and it’s also easier to filter out unsupported modules (as those are in different namespaces) in the DCT style.
ISO 30042:2008 was identical with TBX 2.0 developed at LISA OSCAR. Despite some effort to schedule joint work with ISO, ETSI ISG LIS didn’t manage to renew work on the TBX work item inherited from LISA OSCAR. Instead, LTAC Global and TerminOrgs took to publishing TBX dialects or industry specifications as well as other terminology management-related resources based on the current and planned ISO versions of the TBX (default) standard. GALA and ttt.org still host the latest (albeit stale) LISA OSCAR version including the legacy TBX Basic.
Work in liaison with OASIS XLIFF TC and later with OASIS XLIFF OMOS TC resulted in development and testing of a TBX-Basic to XLIFF 2 with Glossary Module mapping. The mapping upgraded to the modular second ISO edition based TBX-Basic (that is semantically equivalent with the old TBX-Basic) is now being specified on the standards track within OASIS XLIFF OMOS TC. See the latest editor’s draft at https://tools.oasis-open.org/version-control/browse/wsvn/xliff-omos/trunk/XLIFF-TBX/xliff-tbx-v1.0.pdf.