Home NETWORK POLITICS Study for rights holders: AI training is a copyright infringement

Study for rights holders: AI training is a copyright infringement

0


The Copyright Initiative (IU) believes that a study it commissioned has found evidence that the reproduction of works using models for generative artificial intelligence (AI), such as ChatGPT from OpenAI or Gemini from Google, constitutes copyright-relevant reproduction. This could have far-reaching consequences for the continued usability of chatbots, for example. A closer look at the technology used shows that “training such models is not just a matter of text and data mining,” explained Hannover law professor Tim W. Dornis, who carried out the analysis together with Magdeburg computer scientist Sebastian Stöber. “This is a copyright infringement.”

Advertisement


In German and European copyright law, there are no valid restrictions on the exclusive right of exploitation that allow use in the sense of commercial AI training, Dornis said during the presentation. Investigation in the EU Parliament on Thursday. With their work, the two professors basically want to shed light on the black box of learning large language models. Accordingly, AI creators extract and use syntactic and therefore copyright information of the works used in the training data on a large scale.

The essence of the study is that copyrighted works are copied during data collection, are represented in full or in part in AI models and can eventually be reproduced by end users. During training, “several different acts of reproduction of copyrighted works occur.” It starts with their “collection, preparation and storage”. Relevant copies will be created “inside” the model during both pre-training and fine-tuning. Although there is no explicit storage mechanism, training data is certainly “remembered” in current generative models – i.e. kept in their memory.

Monitoring: The SPD wants to examine the new data retention in an “open-ended” manner

Finally, when using generative AI models, researchers found that the works used for training could be copied and redesigned, especially in gestures by their users. Publishing rights of the works are being violated.

The obstacle: ChatGPT & Co. as well as image generators such as DALL-E, Stable Diffusion and MidJourney are based on large language and image models. Operators train them with millions of photos, audio files and texts found on the Internet. As a rule, they do not ask authors and users whether they agree to this experiment. The use of large-scale protected functions in the field of AI modeling is necessary so that algorithms can recognize patterns in existing content and create adaptive content based on them.

In the European Union, legislators have established exceptions to the exclusive right of exploitation for text and data mining with the latest major copyright amendment. The Bundestag has this requirement in paragraphs 60d and 44b Copyright law Implemented. The reproduction of legally accessible digital works is permitted, for example for AI training, “in order to obtain information from them, in particular about patterns, trends and correlations”. Research institutes are entitled to do so, provided they do not serve commercial purposes, do not reinvest all profits in science or “act in the public interest within the framework of a mandate recognized by the government”. The aim is to prevent large-scale data mining by research institutes serving companies.

Authors and users who, despite such precautions, wish to prevent text and data mining of their works available online may reserve the right to use them. Such a declaration is only effective if it is “made into machine-readable form” – for example via a robots.txt file.

However, current copyright barriers cover interference with copyright law associated with the training of generic AI models “only in certain, practically irrelevant constellations,” the authors emphasize. Even if the training takes place outside Europe, developers cannot avoid European rules.

EU parliamentarian Axel Voss (CDU) welcomed the evidence now available. He hopes that the study will provide “further important information and suggestions for a better balance between protecting human creativity and promoting AI innovations.” The researchers encourage lawmakers to decide how to strike a balance between protecting human creativity and promoting AI innovations. For Hanna Möllers, legal advisor to the German Association of Journalists (DJV), the results have “explosive power”. They show that “we are dealing with massive theft of intellectual property.” Now politicians must put an end to the “raiding” at the expense of authors.

Experts “provided the technical and copyright basis for finally turning the legal idea of ​​generative AI from head to toe,” IU’s Matthias Hornschuh emphasizes, adding that “a new, profitable licensing market has long been visible on the horizon.” Generative AI has so far cleverly refused to do so, with various lawsuits from authors and media companies like the New York Times already pending, most notably against OpenAI.


(MKI)

Digital Markets Act: Meta provides insight into WhatsApp and Messenger connection

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version