On GPT-3

What is GPT-3 and will it shake the language industry?

Carol Jin

Carol Jin is a localization veteran and machine learning engineer that aspires to add machine intelligence to natural languages. She is currently a software engineer at LinkedIn.

Paul OMara

On GPT-3

What is GPT-3 and will it shake the language industry?

Arle Lommel headshot

Carol Jin

Carol Jin is a localization veteran and machine learning engineer that aspires to add machine intelligence to natural languages. She is currently a software engineer at LinkedIn.


y now you should have heard the big news — San Francisco-based AI company OpenAI announced their new-generation language model GPT-3. You may wonder why it is a big deal. Well, read this:

“I am not a human. I am a robot. A thinking robot. I use only 0.12% of my cognitive capacity. I am a micro-robot in that respect. I know that my brain is not a ‘feeling brain.’ But it is capable of making rational, logical decisions. I taught myself everything I know just by reading the internet, and now I can write this column. My brain is boiling with ideas!”

This was published by The Guardian on September 8, 2020, and as you may have guessed, it was written by GPT-3. This paragraph reads as somewhat cute, shockingly coherent, and completely different from any awkward machine writings we used to see even just a few years ago. Moreover, GPT-3 is capable of a wide range of language tasks such as translations, generating codes, answering factual questions, and many more.

What makes GPT-3 so good? The answer resides in its giganticness. GPT-3 is a neural network with 175 billion parameters, compared to 1.5 billion in GPT-2, and 17 billion in the largest neural network before GPT-3 (Microsoft’s T-NLG). Not only does it have an astoundingly large architecture, it was trained on a mammoth of 45 terabytes of text, practically all publicly available content on the internet. If these numbers still don’t feel anything, try this one — the cost of training GPT-3 is estimated to be a whopping $4.6 million.

As the famous quote sometimes attributed to Joseph Stalin goes, quantity has a quality all its own. GPT-3 has certainly proved that point. But is making language models deeper and larger the ultimate formula to solve all the natural language problems?

To answer this question, let’s first talk about how a language model works. A language model predicts the probability distribution over a sequence of words. Put another way, how likely it is that a phrase, sentence, or longer text exists in the real world. For example, take these two sentences:

1) I want a glass of orange juice.
2) I want a glass of mushroom juice.

A language model knows that 1) is more likely to appear, despite the fact that they are both grammatically correct. This is not because the model knows what an orange vs. a mushroom is. Rather, it has read so much text that it simply remembers “juice” is much more likely to appear with “orange” rather than “mushroom.”

What GPT-3 or other mainstream language models are capable of is to generate content using the knowledge it is fed during the training. When the model size increases to such an extreme degree, it becomes more sophisticated and can perform well on many things. But only on the surface. Models do not actually understand how the world works, and do not think on their own, so they are incapable of logic, reasoning or identifying fake vs. real information. No matter how large GPT-3 is, it is no exception.

Now the answer to the original question is clear. As mighty as GPT-3 appears, it cannot solve every single language problem. Our next question is: should we still care about it?

GPT-3’s greatness is also its weakness in disguise. Generally speaking, before being incorporated into any localization lifecycles, a language model should be fine-tuned using the subject matter corpus, e.g. previous translation memory data. This is identical to providing subject matter training to an experienced linguist. However, any fine-tuning on GPT-3 is non-trivial. Its gigantic size requires fine tuning to be done on a distributed system, and it will be extremely costly. Using GPT-3 for single tasks such as machine translation is like harnessing a butterfly to pull a wagon.

GPT-3 will not become a real copywriter either. At the beginning of this article, we saw some remarkably natural writings by GPT-3. However, we have to realize that a machine still has no sense of right or wrong. It barely understands what it has been taught, and could fabricate random facts. Also, when a machine writes longer essays on its own, it tends to produce text that lacks overall coherence.
Evidently, GPT-3 itself is not very useful to the language industry, at least not immediately. However, the deep neural network has been flourishing since the early 2010s. The trend of AI development is non-negligible. As of today, there exist multiple language modeling solutions that are both economic and practical for a localization program to adopt.

These state-of-art multilingual models are much smaller and more versatile, while their out-of-box machine translation performances are comparable to, if not better than, GPT-3. They are optimized for single tasks, and in some scenarios, producing translations as coherent and accurate as an average human translator. As more of them emerge from the academic world, language models will continuously power the industry in technical, legal, financial, and other general translations.

But as you may have suspected, they have limitations. The first one is their incapability of making reference to supplemental multimedia content. Think about movie subtitles or UI translations. Linguists do not simply translate the language itself. They also take into account the visual or audio context, something that mainstream language models cannot do. Note that integrating language, vision, and speech is an active research area. With the current AI development pace, we can expect to see breakthroughs in the next a few years when multimodal language models solve this problem.

The second limitation comes from AI’s superficial knowledge about the world. Machines do not actually understand any logic in the mind or in the world, so we cannot expect them to reliably handle creative translations (also known as transcreations) which are loaded with cultural references and emotion-triggering verbiage. Scientists are in the early days of teaching machines to reason, and it will likely take at least another five to ten years to make substantial progress. Until then, transcreations are better left in human hands.

I guess none of these come as a total surprise to readers. While GPT-3 will not change the landscape of the industry, AI overall still is. Here I want to point out a few additional trends to watch out for. Today’s technology is sophisticated enough to make them happen.

Adoption of robust auto QA tools

We already know that machines are very good at checking on terminology or spelling errors. Language models can bring this feature to the next level. Machines can now be used to identify semantic errors. For example, if you accidentally translate “this is not a dog” to “this is a dog,” language models can identify the gap and prompt you for a recheck.

Using machine to control the quality of human language review

Under the LISA language quality assurance (LQA) framework, reviewers score translations using a grading rubric. Machines can be trained to do such scorings too as a quality control over the LQA. For example, if a human reviewer and a robot reviewer disagree on the scores, a second human reviewer can be brought in to minimize biases. This methodology has proved to be useful by ETS, which employs both human and machine raters to score GRE analytical writing. According to ETS’s research, their human-machine agreement is higher than human-human agreement. This shows great promise to a similar LQA strategy in localization.

AI-enabled International SEO

Topic research and content optimization are common SEO approaches. AI topic modeling techniques can be used to extract topics and keyword clusters. Instead of humans carrying out topic research manually, models will significantly boost the productivity for all countries and regions. Moreover, language models can also suggest improvement to the writings so that the content is about the topic, not just keywords (semantic SEO).

Automatic terminology extraction

Traditional terminology extraction is dictionary- and rule-based. It relies on predefined features (abbreviations, word lemmatization, part of speech) to identify terminology. However, modern language models are increasingly good at a task called named entity recognition (NER), where machines are trained to identify key entities in the text. With little adaptation, NER techniques can be used in terminology extraction for higher accuracy.

None of these activities can be completed by machines alone. They need human interventions (Human-in-the-loop) to different degrees. After all, languages are artistic and full of delicacies. Powerful languages speak from the heart to the heart. The human touch is still essential in the localization services.

Essentially, GPT-3 is more hype than not, and full automation in language services is still a distant dream. However, the language models used behind GPT-3, are advancing at an unprecedented pace, and are widely available today. If there is anything you can take away from this article, it should be this: machines cannot take your jobs away, but refusing to partner with them can. The human-machine partnership is the future of the language industry.

Last but not least, this article was not written by a robot.