Copyright Foundation Takes Down Unauthorized Dutch AI Training Dataset

Dutch copyright foundation Stichting BREIN has taken down a dataset that was illegally used to train artificial intelligence (AI) models. This marks a first in the Netherlands and highlights the growing tensions between AI development and intellectual property (IP) rights in the European Union (EU).

The dataset in question was vast, containing unauthorized copies of tens of thousands of books, millions of lines from news articles, and subtitles from numerous films and TV series. Stichting BREIN’s investigation revealed that the Dutch dataset was compressed specifically for use in training large language models (LLMs). The foundation is now working to identify AI models that may have utilized this data, signaling potential legal action against those involved.

According to an article in the NL Times, Stichting BREIN director Bastiaan van Ramshorst emphasized the seriousness of the violation, pointing out that over 10,000 instances of illegal copying were identified within the dataset. “The news articles were copied from websites with copyright reservations. This clearly shows that copyrights have not been respected,” van Ramshorst said.

This case underscores the ongoing global debate about the use of copyrighted material in LLM training data, particularly as the EU’s AI Act looms on the horizon, which will require AI companies to obtain authorization to use copyrighted content and to disclose the datasets used in training their models. In the United States, while lawsuits have been filed claiming copyright violation — such as The New York Times‘ lawsuit against Microsoft and OpenAI — many claim that training AI models using material from the internet is fair use.

The BREIN Foundation was established in 1998 to combat IP infringement on behalf of Dutch authors, artists, publishers, producers, and distributors of entertainment products. The foundation mostly targets violations that take place in or from the Netherlands, but also partners with international organizations to protect Dutch IP.

MultiLingual Staff
MultiLingual creates go-to news and resources for language industry professionals.

RELATED ARTICLES

Weekly Digest

Subscribe to stay updated

 
MultiLingual Media LLC