Applying corpus linguistics in the courthouse

Machine translation (MT) and natural language processing (NLP) specialists are likely to be familiar with corpus linguistics — that is, studying language using large collections of authentic texts to identify patterns in a language’s syntactic or semantic constructions. 

Now, it looks like the next group of professionals to hone their understanding of corpus linguistics could be lawyers. According to a report from the law firm Bradley Arant Boult Cummings LLP, a journal article on corpus linguistics was used as a tool for analyzing a case being seen by the Supreme Court — this is the first time corpus linguistics has played a role at this level, but other courts have used it on occasion, typically in instances where dictionaries do not provide an adequate definition for a given word or phrase.

“It remains to be seen whether (and how) the Supreme Court will address corpus linguistics later this summer, but this method of statutory interpretation is already appearing in other courts,” the report reads.

Corpus linguistics is a subfield of linguistics that was particularly critical during the early days of NLP and MT technology. In corpus linguistics, researchers study a corpus to look at how specific words or grammatical constructions occur in an authentic context. For example, researchers may search through a corpus like the Corpus of Contemporary American English, which consists of more than one billion words compiled from texts like books, newspapers, and journals to find a particular word and identify co-occurring phrases or grammatical structures. 

Multilingual corpora have been particularly useful in developing MT systems, as they typically have large numbers of texts that have already been translated into another language. As such, they were particularly crucial in the development of statistical machine translation systems, which create translations by analyzing bilingual texts and determining the most statistically likely translation of a novel word or passage.

In the context of law, the use of corpora may be a bit less technologically complex, more or less resembling the traditional corpus linguistics methods. Bradley Arant Boult Cummings LLP note that using a corpus-based or corpus-driven approach can have advantages to using a dictionary as tool in the court of law because it shows how a given word or phrase is typically used. 

While a dictionary may be useful in identifying common meanings, occasionally, these meanings are contradictory, and corpus linguistics can allow legal scholars, attorneys, or judges to interpret the most likely meaning of a word based on its context.


Andrew Warner
Andrew Warner is a writer from Sacramento. He received his B.A. in linguistics and English from UCLA and is currently working toward an M.A. in applied linguistics at Columbia University. His writing has been published in Language Magazine, Sactown Magazine, and The Takeout.

Weekly Digest

Subscribe to stay updated

MultiLingual Media LLC