The year 2022 was filled with impressive human-quality claims in the machine-learning (ML) field with large language models and general pre-trained transformers, but experts have raised their voices about some serious issues. Moreover, the constant announcements of improvement in MT are no longer confined to merely the language industry, rather they are now reaching the general public, as well as clients. AI technology democratization is finally becoming a reality in today’s world, so tech specialization seems to be the differentiating factor for the success of companies in the language industry. Consequently, LSPs are now facing the need to address clients’ expectations about MT: low rates, higher volumes, faster deadlines, and more target languages, as well as varying degrees of quality. Furthermore, new challenges are on the table: correct integration of AI tech in the workflows of LSPs, responsive use of MT, relationships of trust with post-editors, and good MT quality estimation.
The good news is we can understand these challenges, the possible effects on LSPs, and the measures that LSPs should take to survive in the market, as well as our expectations for the future.
The “S” growth curve of MT, as with any other technology, is unstoppable and impressive. Investment in this field of research needs to continue to be consistent and reach new goals. LSPs are consequently forced to keep an eye on the latest trends, while benefiting from the advancements and adapting their production workflows accordingly. Nevertheless, the path for embracing this technology must be carefully designed.
One of the challenges posed by this new AI-MT technology is the tendency to hallucinate or make things up as fact. This is aggravated by the fact that the models are like black boxes. Experts don’t know how well they are performing or how to replicate bugs and fix them. Inconsistency seems to be another problem. Therefore, real-word data engineering is the key to making these models robust. Such engineering should involve not only research on models but should also work to label data, gather high-quality training data from the real world, publish that data, increase data in more languages other than just English, and create more customization options for MT users.
We must also consider hardware capabilities and the investment needed to integrate this new era of large and multilingual models. While for NMT it is possible to combine CPUs and GPUs, large models need more computational power to manage such an immense quantity of data and algorithms. In order to move on to larger models, companies need to recoup their previous investments in traditional NMT, which will probably take a number of years.
In light of the preceding, we can anticipate that the first major challenge to LSPs is the ability to trust MT. MT is still not 100% reliable, and LSPs should focus on how to use it responsibly and control it correctly. Therefore, companies should demand standardized quality evaluation models.
LSPs should also work with different levels of quality according to a client’s needs and should always make sure that work at the high-quality level is always be reviewed by a human. This leads us to the role of the professional linguist. They will continue to play a major part in the language industry, even as MT becomes more frequently used. They should be specially trained in how to recognize the main errors made by MT, which is very difficult as fluency increases, while given more job opportunities. MT is likely to make fewer mistakes, so the post-editing task will be more focused on subject area adaptation, terminology, and cultural sensitivity. It is important that LSPs train their linguists on MTPE, as we did with our MTPE Course, so they can have a trusted pool of experts. Looking ahead to other strategies, quality estimation technology will become key in this whole process, as well as data engineering services. Mastering such technology will be essential for LSPs, because whoever manages AI better than their competitors will be the ones to succeed.
The human touch
So, how can an LSP like CPSL integrate the next generation of MT into its workflows? By continuing to automate processes through MT integration with all company tools, while maintaining a human-in-the-loop approach driven by quality. We have recently created a new department called “New Technologies and AI Solutions,” whose functions include, among other things, establishing quality evaluation guidelines that enable correct and responsive MT integration, while constantly exploring new trends. The essential idea is to be enthusiastic about new developments, as long as the quality is guaranteed.
Regarding upcoming trends that may appear soon, large and multilingual models will become the most widely used technology in the market. Moreover, AI will be applied beyond MT, for example to improve the source, to automatically estimate MT quality, or to automatically post-edit in order to correct common errors or highlight others, thereby assisting the post-editor in their work. In addition, more subject areas, such as creative domains, will probably be suitable for MT, consequently creating more content on the fly, and probably directly into several languages. Finally, instead of having several engines for each domain, we will probably have a large model in which we can create domain and language register prompts to fine-tune a model based on our needs.
For further information about this topic, you can check out this podcast: