The growing role of neural MT in the life sciences

Focus

John Tinsley

John Tinsley is the CEO of Iconic Translation Machines Ltd. He holds a PhD in machine translation (computing) from Dublin City University and also consults as a language technology expert with the European Commission.

John Tinsley

hy neural machine translation (NMT)? Why now? Why life sciences? Good questions! It’s probably fair to say that there weren’t compelling answers to these questions just a few years ago. Today, however, the answers are quite clear and this reflects the growing role of machine translation (MT), artificial intelligence (AI) and automation in general across the life sciences.

Why now?

Why not? It is already being widely adopted across a variety of industries, from legal to automotive to IT, and is having a transformative effect on how businesses deal with multilingual content. There is a risk that organizations that lag behind could lose a competitive edge as others take the plunge.

Of course, with the introduction of any new process automation, there will be concerns over quality. This is an important topic that we will return to later in the article in order to illustrate how businesses are overcoming this. The technology is up to scratch and delivering results, so it’s more a case of designing a framework to validate quality and alleviate concerns.

Why life sciences?

Like many businesses, global life sciences organizations are feeling the push and pull demands of a growing multilingual content landscape. From pharmaceutical companies, to contract research organizations (CROs), to medical device developers, there are increasing market pressures to get information out to customers, patients and stakeholders around the world, faster and more cost-effectively.

Figure 1: There are a variety of enterprise applications for NMT.

There are also significant regulatory requirements to understand and report on information such as adverse events emanating from multiple sources, and in multiple languages, often with a very short turnaround time.

This demand is only growing, and the existing, manually intensive workflows are not sustainable. As such, robotic process automation (RPA) is high on the list of priorities in the life sciences as a “need to have” rather than a “nice to have.” When it comes to language, this means NMT. Fortunately, many of the characteristics of the content and use cases in the life sciences lend themselves quite well to NMT (Figure 1) making them a very suitable candidate for bespoke solutions.

Figure 2: Adverse event reports can take many forms, all of which need translation.

What are these use cases?

Life sciences is quite broad, so let’s focus on a few specific areas where NMT and, more broadly, natural language processing (NLP) is gaining traction: pharmacovigilance and adverse event reporting, global clinical trials and medical device development.

Pharmacovigilance

Pharmacovigilance (also known as PV, or PhV) is the process by which companies monitor various channels in order to identify and act upon potential issues with their products in the market, such as side effects of drugs or faults in medical devices. As these issues can have serious implications with the health of individuals, the process is heavily regulated by bodies such as the FDA (Food and Drug Administration) in the USA, and the EMA (European Medicines Agency) in the EU.

Figure 3: The steps of the case intake process.

In the case of drug side effects — more commonly referred to as adverse events — the process involves accepting adverse event reports from various sources including medical practitioners and directly from patients. These reports need to be triaged and categorized according to the product, the side effects and other pieces of information (such as dosage) in order to generate a case and, where relevant, make a regulatory submission.

This process is known as case intake, and is typically carried out either by the pharma company itself, or outsourced to a CRO. Traditionally, this is a highly manual, resource intensive process that is decentralized across multiple local sites around the world that carry out the intake and interpretation of adverse event reports in their location. This is where the challenges start.

The first challenge is that the basic format of the reports is very heterogeneous, ranging from emails, to web forms, letters and handwritten notes (Figure 2). More recently, the process also involves monitoring newer channels such as social media. Volumes are also growing rapidly, with companies receiving millions per year at an unsustainable rate creating large backlogs. And the kicker: the reports are increasingly multilingual.

This involves further intensive engagement with language vendors at significant additional cost, which also has the impact of affecting turnaround time for regulatory filings. There is a massive opportunity to introduce robotic process automation via AI and NMT, which many organizations are taking. Let’s look at how.

There are multiple steps in the case intake process that can benefit from automation, even where there is no multilingual component (Figure 3).

The initial file intake and preparation step can make use of file handlers and optical character recognition to get text into machine-readable format. Then, of course, high-quality adapted NMT can be deployed to homogenize reports into English for processing. After that, various AI and NLP methods can be used to automate document classification, detect specific entities, extract information for case generation and summarization.

The benefits are huge. Centralizing the workflow and processing files in a single language reduces the outsourcing costs and internal overheads related to vendor engagement. It also dramatically reduces the time it takes to report on the adverse events, limiting the risk of late filings and subsequent fines.

Much of the above also applies to adverse event reporting for medical devices too, and goes to show why RPA is such a hot topic in the healthcare industry.

Clinical trials

Another burgeoning area for NMT in the life sciences is clinical trials. These trials are a key step in the drug development process whereby an organization, having already invested heavily and achieved a number of key milestones, is ready to test their drugs with real patients (Figure 4).

Clinical trials are either run by the pharmaceutical company themselves, or more typically by CROs. They are complex and take place over an extended period of time, often in multiple different sites, in different countries. This creates a language challenge.

Figure 4: The drug development process, and the multiple phases of clinical trials.

There are a lot of key documents at the outset of trials, and information generated during trials that come into play. These include:

Investigator’s brochure: a comprehensive live document describing the drug in question, updated as the trial runs.
Informed consent forms: the form signed by participants that includes all information about the trial from their perspective.
Protocols: description of the methodology, objectives and other considerations of the trial.
Ethics committee letter: request for clearance to run the trial from national ethics committee, including lots of supplementary information.

All of the above information needs to be translated into the language of the trial site, and global trials can be run in multiple countries. Once trials are underway, more information will be generated in the target language in the form of reports, which will also need to be translated back to the original language for consumption and dissemination.

Translation in these cases is again often decentralized, with local sites operating independently and coordinating with vendors. In this case, there are no inherent cost savings based on volumes and there can be disparities in quality across locations.

The role that NMT and automation has to play in clinical trial translation is multifaceted. In one sense, it is a part of a more traditional localization workflow involving post-editing in order to translate vast amounts of information into multiple languages in a more productive manner.

Additionally, a lot of the information generated during trials doesn’t necessarily need to be translated for external stakeholders, but rather for in-house teams, perhaps in the pharmaceutical company’s headquarters. In this case, raw MT translations can be more than sufficient and give the teams access to information they would not typically be able to review on demand.

Lastly, NMT allows the above to be more centralized, giving the CRO and the investigator more clarity on the process and related costs, and probably cutting out quite significant overheads.

Adapting NMT for life sciences

Another key driver of the adoption of NMT in the life sciences is that the types of content being translated lend themselves well to domain-adapted solutions. For pharmacovigilance, content is shorter and specific in terms of referring to certain drugs, and side effects.

For clinical trials, much of the information is very template driven and specific to a particular organization and their products, meaning NMT systems can be heavily trained on key aspects of terminology.

In addition to important resources that end users can bring to the table, such as product lists and other data that may have been collected over time, there are ample data resources around regulatory activities that can be exploited and leveraged as needed to create bespoke solutions for specific users.

Let’s talk about quality

When it comes to MT, questions about the reliability of the output always come to the fore. As an industry, life sciences has been much slower to adopt not only MT, but other forms of automation, because of concerns over quality. This is understandable because the consequences for getting something wrong can be dire. However, there is no need to worry.

Automation is an enabler. With the right quality assurance processes in place, as the phrase suggests, absolute quality can be assured. Users get all the benefits of automation — faster turnaround times, lower costs and lower overheads — with no change in the end result.

Consider the case intake automation workflow above; there is a quality control step at the end that will remain whether earlier steps in the process, including translations, are completed by people, machines or a combination of both. With the advances in NMT, and the fact that the pace of developments and improvements is still very high, things are only going to get better. In parallel as a consequence, the time to process information through the entire workflow decreases, as does the quality assurance overhead.

Let’s not forget data security

Of more concern to end users when it comes to using MT in healthcare, should be what is happening with the information being processed. This is not just confidential information, but also personally identifiable information and personal health information, which are strictly controlled under regulations around the world, including GDPR in Europe and HIPAA in the USA.

End users need to fully understand what is happening with data in the NMT system being used, where it’s being processed and what happens with the data once translation has been completed.

We are now at an inflection point when it comes to the adoption of MT by enterprise users. Massive leaps in the quality of NMT over the past few years have brought a change in attitude and perception toward translation automation. As an industry that has been more hesitant to deviate from standard operating procedures, life sciences organizations are now very well positioned to capitalize on this wave of change, and I for one, look forward to seeing where it takes them!

Back to Issue