United States judges affirm that digitizing and training LLMs on published books falls under fair use
Meta’s Llama and Anthropic’s Claude can legally be trained on copyrighted books, according to two recent US court decisions that mark a pivotal moment in the legal debate surrounding large language models (LLMs) and intellectual property (IP). In lawsuits brought by authors who claimed their works were used without consent, both companies successfully defended their artificial intelligence (AI) training practices under the doctrine of “fair use,” a critical concept in US copyright law.
Training LLMs is “transformative,” not derivative
Anthropic’s case hinged on its practice of purchasing physical books, digitizing them into a searchable library, and using that library to select training material for Claude. In a June 23 ruling, US District Judge William Alsup found that both the digitization and the model training qualified as fair use. He reasoned that Anthropic did not alter the creative content of the books or share them publicly. Instead, it changed their format and used them in a new context: AI training.
Judge Alsup emphasized that the Copyright Act protects derivative works that involve new creative material — such as translations, dramatizations, or adaptations — but that LLM training does not fall into that category. The act of distilling patterns and structure from a large corpus, he concluded, is transformative rather than exploitative.
This distinction may also affect translation rights. The ruling suggests that LLM-generated translations of books would still require approval from copyright holders, but scanning and training on already translated versions would likely be considered fair use.
Style isn’t copyrightable, but expression is
A separate case involving Meta’s Llama yielded a similar result. On June 25, federal Judge Vincent Chhabria ruled against a group of 13 authors who alleged that Meta’s LLM was trained to “regurgitate” their works and compete with human-created content.
Judge Chhabria rejected this, noting that Llama would not reproduce more than 50 consecutive words from any copyrighted book and was instead trained to emulate writing styles. He clarified that while style is not protected by copyright — specific expression is — and there was no evidence that Llama replicated identifiable passages verbatim. He added that LLMs are “innovative tools” capable of many tasks, including translation, but not designed to generate entire works that directly compete with original authors.
Implications for AI, publishing, and localization
These rulings set a strong precedent in favor of AI developers, reinforcing the notion that training LLMs on copyrighted books — without reproduction or distribution — falls within legal bounds. However, it leaves key questions unresolved, especially regarding full-text outputs, translations, and derivative content across languages.
For the localization industry, the distinction between style, structure, and expression may become increasingly relevant as AI tools are used to support multilingual content creation. If AI-generated translations are treated differently from original training material, legal clarity will be needed on who holds the rights to translated outputs.
The debate has extended beyond the courtroom. In a June 28 episode of the All In Podcast, David Sacks — former “AI Czar” under the Trump administration — voiced his support for the rulings, stating that “if an AI model violates someone’s copyright by outputting something that’s identical, then obviously that’s a violation. But if all they’re doing is transforming the work… then that is not a violation of copyright.”

