The growing role of neural MT in the life sciences
hy neural machine translation (NMT)? Why now? Why life sciences? Good questions! It’s probably fair to say that there weren’t compelling answers to these questions just a few years ago. Today, however, the answers are quite clear and this reflects the growing role of machine translation (MT), artificial intelligence (AI) and automation in general across the life sciences.
Of course, with the introduction of any new process automation, there will be concerns over quality. This is an important topic that we will return to later in the article in order to illustrate how businesses are overcoming this. The technology is up to scratch and delivering results, so it’s more a case of designing a framework to validate quality and alleviate concerns.
This demand is only growing, and the existing, manually intensive workflows are not sustainable. As such, robotic process automation (RPA) is high on the list of priorities in the life sciences as a “need to have” rather than a “nice to have.” When it comes to language, this means NMT. Fortunately, many of the characteristics of the content and use cases in the life sciences lend themselves quite well to NMT (Figure 1) making them a very suitable candidate for bespoke solutions.
This process is known as case intake, and is typically carried out either by the pharma company itself, or outsourced to a CRO. Traditionally, this is a highly manual, resource intensive process that is decentralized across multiple local sites around the world that carry out the intake and interpretation of adverse event reports in their location. This is where the challenges start.
The first challenge is that the basic format of the reports is very heterogeneous, ranging from emails, to web forms, letters and handwritten notes (Figure 2). More recently, the process also involves monitoring newer channels such as social media. Volumes are also growing rapidly, with companies receiving millions per year at an unsustainable rate creating large backlogs. And the kicker: the reports are increasingly multilingual.
This involves further intensive engagement with language vendors at significant additional cost, which also has the impact of affecting turnaround time for regulatory filings. There is a massive opportunity to introduce robotic process automation via AI and NMT, which many organizations are taking. Let’s look at how.
There are multiple steps in the case intake process that can benefit from automation, even where there is no multilingual component (Figure 3).
The initial file intake and preparation step can make use of file handlers and optical character recognition to get text into machine-readable format. Then, of course, high-quality adapted NMT can be deployed to homogenize reports into English for processing. After that, various AI and NLP methods can be used to automate document classification, detect specific entities, extract information for case generation and summarization.
The benefits are huge. Centralizing the workflow and processing files in a single language reduces the outsourcing costs and internal overheads related to vendor engagement. It also dramatically reduces the time it takes to report on the adverse events, limiting the risk of late filings and subsequent fines.
Much of the above also applies to adverse event reporting for medical devices too, and goes to show why RPA is such a hot topic in the healthcare industry.
Clinical trials are either run by the pharmaceutical company themselves, or more typically by CROs. They are complex and take place over an extended period of time, often in multiple different sites, in different countries. This creates a language challenge.
- Investigator’s brochure: a comprehensive live document describing the drug in question, updated as the trial runs.
- Informed consent forms: the form signed by participants that includes all information about the trial from their perspective.
- Protocols: description of the methodology, objectives and other considerations of the trial.
- Ethics committee letter: request for clearance to run the trial from national ethics committee, including lots of supplementary information.
All of the above information needs to be translated into the language of the trial site, and global trials can be run in multiple countries. Once trials are underway, more information will be generated in the target language in the form of reports, which will also need to be translated back to the original language for consumption and dissemination.
Translation in these cases is again often decentralized, with local sites operating independently and coordinating with vendors. In this case, there are no inherent cost savings based on volumes and there can be disparities in quality across locations.
The role that NMT and automation has to play in clinical trial translation is multifaceted. In one sense, it is a part of a more traditional localization workflow involving post-editing in order to translate vast amounts of information into multiple languages in a more productive manner.
Additionally, a lot of the information generated during trials doesn’t necessarily need to be translated for external stakeholders, but rather for in-house teams, perhaps in the pharmaceutical company’s headquarters. In this case, raw MT translations can be more than sufficient and give the teams access to information they would not typically be able to review on demand.
Lastly, NMT allows the above to be more centralized, giving the CRO and the investigator more clarity on the process and related costs, and probably cutting out quite significant overheads.
For clinical trials, much of the information is very template driven and specific to a particular organization and their products, meaning NMT systems can be heavily trained on key aspects of terminology.
In addition to important resources that end users can bring to the table, such as product lists and other data that may have been collected over time, there are ample data resources around regulatory activities that can be exploited and leveraged as needed to create bespoke solutions for specific users.
Automation is an enabler. With the right quality assurance processes in place, as the phrase suggests, absolute quality can be assured. Users get all the benefits of automation — faster turnaround times, lower costs and lower overheads — with no change in the end result.
Consider the case intake automation workflow above; there is a quality control step at the end that will remain whether earlier steps in the process, including translations, are completed by people, machines or a combination of both. With the advances in NMT, and the fact that the pace of developments and improvements is still very high, things are only going to get better. In parallel as a consequence, the time to process information through the entire workflow decreases, as does the quality assurance overhead.
End users need to fully understand what is happening with data in the NMT system being used, where it’s being processed and what happens with the data once translation has been completed.
We are now at an inflection point when it comes to the adoption of MT by enterprise users. Massive leaps in the quality of NMT over the past few years have brought a change in attitude and perception toward translation automation. As an industry that has been more hesitant to deviate from standard operating procedures, life sciences organizations are now very well positioned to capitalize on this wave of change, and I for one, look forward to seeing where it takes them!