The Ethical Supply Chain Problem
The conscious choice of our localization team to deviate from a standard GenAI approach stemmed from the fact that the current GenAI generation comes with hidden costs often glossed over in boardroom presentations. While a chatbot interface is clean and user-friendly, the machinery behind it is often fraught with challenges.
First, there is the persistent issue of machine translation quality. While GenAI has improved the standard, the output suffers from hallucinations — invented content that deviates from the source. In creative writing, this might be an interesting feature; in our industry, where accuracy and brand trust are paramount, it is a non-negligible risk. We cannot afford for a wellness app button or an offer disclaimer to be translated creatively rather than accurately.
In parallel, we had a deep hesitation on ethical grounds. The current wave of generic LLMs suffers from what we might call an ethical supply chain problem. Many public models are built on questionable content scraping technologies and the mass ingestion of translation memories (TMs) without consent. To train the models, this new technology consumes the intellectual property of human professionals (creators and translators) who are rarely, if ever, remunerated or even credited. So we asked ourselves: Is it a healthy choice to build our work efficiencies on the foundations of appropriated labor? Is it ethically sustainable to use tools that take advantage of the professionals who make global communication possible?
Finally, there is the environmental cost. AI data centers demand an enormous amount of energy that, in many cases, does not come from renewable sources. Beyond electricity, these facilities also require a staggering amount of water for cooling, often competing with local communities for freshwater in drought-prone areas. In the context of the global climate emergency, using resources at an unsustainable rate threatens the ecosystem that supports us all. As a localization team in a company dedicated to wellbeing, we could not in good conscience take a road that would lead to savings and efficiencies at any cost.
Redefining AI: Intelligent Automation
With all this in mind, we went back to the drawing board. The directive was to use AI to increase efficiency, but the specific tool was not prescribed. We asked a pivotal question: What if AI didn’t mean GenAI in the popular sense? What if we interpreted the mandate as intelligent automation instead?
We decided to look inward. We realized that our concerns about using AI public models didn’t mean rejecting automation entirely. And we weren’t starting from zero. Over the years, our in-house localization team built a massive repository of high-quality TMs. Much like building muscle memory through consistent, disciplined training, these assets were created and maintained by our human experts through years of hard work. We also had a robust technological stack based on Phrase, a platform we had already tailored to our needs and also integrated into our ecosystem. The adoption of this technology and our committed work had already generated production efficiencies. But if we were to take things to the next level while maintaining our language operations wellbeing, we would need to consider using technology in a smart, innovative way.
This exploration led us to our solution: Phrase’s Custom AI. This choice represented a fundamental divergence from the GenAI-for-everything trend. Unlike generic models that act as black boxes trained on the entire Internet, this technology trains neural machine translation (NMT) engines exclusively on our own linguistic assets.
This option offered several critical advantages.
First, by training the model strictly on our own curated TMs, we eliminated the risk of GenAI hallucinations. Since our technology of choice relies entirely on our approved terminology and brand voice, improving reliable, highly accurate outputs and reducing “creative” deviations.
Second, by following a walled garden approach, we could select the materials used to train the model, ensuring that we were controlling the process and also that our data never leaked into public models. Likewise, using our TMs meant we would not have to rely on scraped content from the Internet. That would allow us to avoid normative risks regarding IP (among others).
Finally, the approach achieved improved environmental sustainability. The energy and water demands for smaller-scope NMT models are considerably lower than GenAI models.
Holistic Implementation
Choosing the right approach to technology was only the first step. To make intelligent automation work, we had to ensure our inputs were pristine. As the old adage goes, “garbage in, garbage out.” We knew that an AI engine is only as good quality as the training data, so we followed a strict protocol to ensure our automation remained high-quality and reliable.
The process began with a comprehensive data audit. Our TMs are the result of relentless and meticulous work over years, but like any legacy database, they weren’t perfect. Before we trained a single engine, we looked inward and performed a rigorous data detox of our existing assets. We removed outdated terminology, fixed historical inconsistencies, and aligned our segmented data with our current brand voice. By ensuring that the training data was pristine, we guaranteed that our technology would learn from the best version of our work rather than replicating our past mistakes.
Once the data was clean, we focused on glossary integration. Wellhub’s terminology is highly specific to the corporate wellness space. For instance, eligibles is the term we use for the employees who have access to our platform; likewise, members is not quite the same as users. We invested heavily in updating our glossaries within our platform and ensuring they were consistently enforced by the technology engine. This step was crucial to prevent the translation of words based on probability. Instead, it adheres to our specific corporate lexicon, ensuring consistency and quality, across all user touchpoints.
Finally, we designed the workflow to support adaptive learning. The true power of this NMT technology lies in its ability to improve over time. The system is not static. When our internal linguists edit AI output and make a correction, that correction doesn’t just fix the sentence at hand; it feeds back into a TM that will be used again in the future to retrain the model periodically. This means that if we change a preferred term today, the technology will learn about it and implement it in its future output. This creates a virtuous cycle where the more we use the system, the smarter it gets, and the less our humans have to correct the same errors over again.
The Proof of Concept
Theory is necessary but not enough; we also needed to prove this worked in practice. After some small, controlled tests with promising results, we skipped a small-scale pilot, instead applying the new workflow directly to the State of Work-Life Wellness report. One of our most important marketing efforts targeting executives and human resources leaders, this annual 20,000-word publication offers thought leadership on understanding, measuring, and improving employee wellbeing — and boosting worker satisfaction, retention, and performance along with it. It was a great opportunity to determine whether our new approach could handle volume and complexity without sacrificing the nuanced quality required for such a high-profile publication.
In this new workflow, instead of outsourcing the report to an external vendor for translation, we processed it with our AI technology. Then, the NMT output was edited by our team of lead linguists. The result? A resounding success. By removing the dependency on language service providers and using our custom engine for the first pass, we dramatically accelerated our timeline, reducing our complete turnaround time (external translation and in-house review) by almost 50%. It’s important to note that this speed did not come at the cost of lower quality, because the engine was trained specifically on our data. The raw output already kept our usual high-quality standards, requiring significantly less remediation than generic engine output would have.
Apart from the significant spend reduction associated with the automation, our efficiency gains were transformative overall. Even though editing NMT output takes slightly more time than human-translated content, we not only improved production speed while maintaining quality but also eliminated any administrative time invested in vendor management (requesting quotes, emailing files, chasing deadlines, feedback rounds, etc.). This allowed us to save additional bandwidth, shifting those resources toward value generation. As a result, our workflow pivoted from a transactional model of outsourcing to an internalized model of expert review.
Elevating Human Expertise
These days, there seems to be a shared idea about our industry that technology has rendered human expertise obsolete. That the role of people is to simply polish any content provided by the machine. Our experience proved the opposite. Technology not only did not replace our localization team but elevated it, making it more relevant to our company operations than ever before.
Under our previous hybrid model, our internal localization leads primarily functioned as quality gatekeepers. They were tasked with editing and polishing translations delivered by external vendors, which often meant we were paying twice: once for the vendor’s translation and again for our internal leads’ time to review and manage it. Now our localization experts possess full ownership of the content localization pipeline. Our technology handles the heavy lifting of the first draft, reducing the cognitive fatigue of repetitive, manual corrections. This terminology-compliant output helps our team flex their creative muscles and focus entirely on the last mile of quality and stylistic fluency, a task requiring true human nuance, empathy, creativity, and cultural understanding.
This shift redefined the daily roles of our localization team. Our lead linguists have more time and capacity to ensure our core mission is intrinsic to every piece of content we publish. They act as cultural sentinels, catching idioms or concepts that might be technically correct but culturally jarring for a specific market, target audience, asset type or comm channel. Furthermore, they are the trainers of our own engine. They clearly see that their feedback directly improves the system, giving them a sense of ownership over technology versus the prospect of being replaced by it. We proved that keeping the human in the localization loop isn’t just an ethical safeguard; it is a quality necessity.