Rewriting the Myths of Life Science Translation

The Cambridge Dictionary

Throughout history, myths have served to explain the natural world and our place in it, bringing a sense of order and meaning to often puzzling circumstances and observable facts such as illness or floods. Whether the story of Sisyphus endlessly rolling a rock uphill, Prometheus stealing fire, or Zeus controlling the weather, human beings have found meaning and purpose in these tales and have been reluctant to relinquish our belief in some of them to this very day.

But as new evidence has emerged over time, many of these long-held beliefs and ideas have been transformed, or even discarded, to match modern realities and sensibilities. New discoveries have provided better answers to natural phenomena and allowed scientific progress to be made.

In our modern era’s disruptive and rapidly changing landscape, many of the “set in stone” requirements of life science translation are undergoing a similar transformation from well-considered precautions into unnecessary myths. 

Often we keep these myths alive to assure ourselves that we are taking every step needed to mitigate risk and provide quality information to clinicians, lab technicians, and patients. 

Our intentions are good, of course, and often based on years of hard-won experience. But the reality is that, in many instances, our cherished myths may be delaying life-saving therapeutics, tests, and vaccines to satisfy requirements that can now be better met through streamlined modern processes and automation. While the pandemic will eventually recede, it is clear that the lessons learned must result in better, more agile solutions.

Upstream of the documentation and translation processes, a sea change is already occurring in the life sciences. Both R&D and manufacturing are being enhanced and accelerated by artificial intelligence (AI) and machine learning (ML). According to Contract Pharma, “Applications of AI and ML in healthcare are expected to grow to nearly $8 billion by 2022, up from $667.1 million in 2016, and almost half of global life science professionals say they are either using or interested in using AI in their research.”

Whether you work in the life sciences or supply translation to life science clients, the time has come to address the many myths surrounding the activities required to manage this content successfully. 

Let’s examine some of the dominant myths more closely.

Myth #1: Artificial intelligence and machine translation are ‘too risky’ for life science content

This myth is still quite prevalent in the life sciences and emerged again as an unexpected theme at an AI conference I attended recently. There were many high-level participants from major life science companies and several of their LSPs. Nearly all agreed they were comfortable using AI and MT for non-customer-facing content such as email translation related to merger and acquisition activities. But all participants were unwilling to use this automation for any content falling under the auspices of their regulatory departments, such as product registration, clinical trials documentation, or customer-facing documents such as instructions for use (IFU). This concern is real for many industry members based on past bad experience with an immature machine translation process or other underdeveloped automation. But more recent technology advances, and my company’s internal testing, suggest the opposite if appropriate preparation is made before implementing these maturing solutions. 

In fact, life science documentation is generally highly repetitive and technical, which makes it particularly well-suited for MT. A validated terminology database in tandem with a translation memory that has been optimized for editorial distance using AI can often supply equivalent quality results, especially when included as part of a process incorporating a custom machine translation engine and human post-edit with effective linguistic quality assurance (LQA) as the final stage. 

While the perceived “fluency” may still require human post edit, advances in neural MT encoding and decoding can now provide translation accuracy in the high 90th percentile, which rivals or exceeds more traditional translation memory (TM)-based fuzzy matching in accuracy. For example, a high-matching-segment may still contain one word that substantially alters its meaning (e.g., increase versus decrease) that can be easily missed by a human translator. This is a common type of error when relying on high fuzzy matches. The MT offered for post-edit would be less likely to contain this error in the raw MT, offering better risk mitigation and beginning quality baseline for the post editor.

The evolving standard procedure uses cleansed TM above 75% matching while sending lower match threshold segments for machine translation post-editing (MTPE). Further gains in MT output quality can be made using the context vector method as an additional preparatory approach. In this process, verified TMs are used to adjust the MT output based on patterns found in the data, delivering a cleaner translation for the human editor to finalize while capturing these updates for future use and providing ongoing “in-process” training for the MT engine.

In practice, most of the content is still processed through computer assisted translation (CAT)-supported human linguists, while the MT produced content is still verified by a human linguist. Currently, this approach can save an average of 20-40% in post-edit effort without increasing risk. 

Myth #2: It is more efficient to have the company’s divisions manage their own translations

Direct experience contradicts the prevalent myth that assembling a centralized content management process is akin to fighting a multiheaded monster like the hydra. 

There may certainly be some heavy lifting to start, including harmonizing terminology and cleansing and aligning TMs. Not to mention the many disagreements inherent in gaining consensus for this major process change.

But the simple truth is that the centralized process I assisted in creating and managing at my former life science company brought reliable and accurate reuse averaging more than 60% across seven divisions with highly divergent product lines and quality metrics well below the upper control limit. The process has now evolved to a point where quality metrics are often dipping into Six Sigma territory. After adding a large content management system (CMS), desktop publishing (DTP) costs were reduced by 80%, and schedules began to compress 25% on average.

It is doubtful this can be achieved when every division has its own TM, terminology, and, worst of all, different instructions and style guides to be enforced by the LSP. The sheer cost in resources alone is prohibitive. Add in the overhead of multiple different DTP, TMS, and CMS software licenses, and you have a recipe for poor quality along with elevated costs. I strongly believe this is another myth that cannot leave us soon enough.

Myth #3: Business and product groups within life science companies are too diverse to have standardized terminology

This seems an excellent follow-up to Myth #2. The company I reference above is a global medical device company where we created a “glossary decision team” many years ago. It continues to reap the benefits of this decision, including a validated glossary containing more than 500 standardized terms with a sub-glossary containing standard localized units of measure and numeric forms. Nearly all business groups use these glossaries across the enterprise. The team has members representing all disciplines within the company and continues to provide standardized terms for new products prior to launch. This glossary is flagged and enforced by authoring software used by technical writers and term databases employed by the LSPs. 

This approach to terminology has had an obvious impact on quality levels for many years with an enormous positive impact on schedule and cost.

Myth #4: Translation memories are ‘sacred’ and 100% matches must be applied without question

This is a more technical version of “we’ve always done it this way.” The appeal of rigid TM application serves both the idea of risk mitigation and the ongoing need for cost savings. While this may be comforting in the short term, current experience and internal testing show that analysis of editorial distance using AI will usually uncover many issues within a client’s TM. These issues exist as “time bombs” within the client’s data, waiting to cause havoc in new or updated content.

Multiple variant translations for single-source segments are quite common, along with legacy defects, obsolete entries and incorrect or conflicting terminology within target segments, including 100% matches. Cleansing is almost always beneficial and is a valid requirement for implementing an MT workflow if one is to have high confidence in the output. In the past, TM cleansing has been cost-prohibitive and extremely time and resource-intensive, keeping it on the eternal “to-do list.” 

AI-controlled cleansing overcomes this cost and resource challenge while typically saving 80% of TM review time and boosting overall linguistic accuracy by 90-100%. AI cleansing can also provide a holistic analysis and overview of TM quality that goes well beyond human segment by segment review and allows focus on areas or language pairs of greatest impact and concern. 

An editable disposition list is easily assembled using AI to capture client subject matter expert (SME) input for proposed changes, if desired. The list can also be used to assess any changes that might affect claims or other critical documentation elements for registration while providing an audit trail for these decisions.

The cost savings from avoiding re-translation and any associated product release delays is a core driver for dismissing this myth post haste.

Myth #5: Third Party Review must be conducted by company SMEs

This is another area where AI and MT have proven to be game-changers. Utilizing AI-assisted TM cleansing in conjunction with input from company SMEs has allowed a higher level of accuracy in training of custom MT engines. 

AI assessment of error criticality can then be utilized for inbound translation data. Any potential errors above a certain scoring threshold are deemed worthy of further analysis by a human reviewer and sent to a secondary review workflow. 

As a result of these process improvements, a major client is considering removing their current third-party review process entirely as it rarely catches errors exceeding Six Sigma quality. They believe their SMEs time can be better spent consulting and dispositioning the most critical linguistic questions during this enhanced LQA process. Additionally, this method provides an average 10-day reduction in time-to-market.

In this new model, the SME’s deep knowledge is captured and utilized more efficiently in the MT engine training and AI analysis, restoring more of the SME’s time for their core tasks. This is a clear win-win for all concerned.

Myth #6: Clinical trials are too critical to change the standard process

In fairness, this does not yet meet the myth criteria as it remains mostly untested. Let’s call this a stretch goal, as most real progress in the modern world begins with challenging the status quo. 

It is undoubtedly true that the criticality of accurate information and data in a clinical trial cannot be overstated. But that doesn’t mean new methods should not be attempted if proper precautions are in place. Assuming that correct terminology is established, TMs are cleansed, and a CMS with variable content and boilerplate content has been tagged properly, the following test scenario suggests itself. 

When a trial requires two separate paths for translation with final review and reconciliation, why couldn’t one of those paths be processed using MTPE? There will be a human editor on the MT path and human linguists utilized at the end to reconcile both translation versions, so the risk remains controlled by standard process. 

Additionally, the comparison and learnings could then be captured to support (or not) the use of MTPE going forward for both translation paths, with standard reconciliation processes at the end to ensure risk mitigation and the highest quality.

Once perfected, this new method could save 20-40% of post-edit effort, expediting the release of new life-saving products. That seems to benefit everyone, making this another “myth” worth challenging.

Rewriting the myths of life science translation requires discipline, careful planning, and rigorous proof of concept. But ultimately, it should help you and your team stop living the story of Sisyphus, rolling the proverbial rock uphill repeatedly, and replace it with the power of Athena, Goddess of Wisdom. 

It’s time to tell a better story.