A child of about two years touches the fingers of his newborn sister. This image is included in a dataset called LAION-5B. There is also information about the names of the two girls and the hospital in which the photo was taken. Human Rights Watch found about 170 photographs of Brazilian children in the data set, which is used, among other things, to train AI models. According to the organization, this is only a fraction of such photos. They criticize the fact that the children did not consent to this and warn that the images could be misused.
Advertisement
“Children should not fear that their photos will be stolen and used against them,” said Hye Jung Han, a children’s rights advocate at Human Rights Watch. Demand in a blog post Call on governments to enact laws as soon as possible to protect children’s data from misuse of AI.
LAION-5B is one of several data sets introduced that are used for AI training. To do this, content is scraped, i.e. collected and processed, from the Internet. For example, undesirable and criminal content is sorted and marked by cheap staff. The extent to which consent is required for processing is regulated differently around the world or is still unclear. On the one hand there is the question of copyright of the data, on the other hand there is the issue of data protection and the processing of personal data.
Human Rights Watch analyzed only 0.0001 percent of the 5.85 billion images present in LAION-5B, including captions. They also found photos of births, birthdays or children dancing in underwear. Many of these photos were originally only visible to a small group of people, the activists wrote. They could not be found through search engines. Some of the images were uploaded many years ago, several years before LAION-5B, and concerns also exist about AI applications. AI models that have been trained with the photos may produce one-to-one or identical outputs.
LAION is a German non-profit organization. They have announced that they will remove all known material from the data set. Furthermore, according to Humans Rights Watch, the association states that children and their parents are responsible for removing private images of children from the Internet – this is the most effective protection against abuse.
Suspicious use of all content from the Internet
Many website operators are now trying to exclude crawlers from their sites to protect their content. For example, Meta collects images and contributes them itself to use them to train its own AI models. They are currently obtaining permission for this with a note about changing the data protection declaration. Consumers and data protection advocates criticize this approach and call for it to stop.
Google also says that it uses all the content available on the internet. OpenAI is mostly silent when it comes to the origin of the training data. However, CTO Meera Murati said that all freely available data flows into Sora Video AI, including meta platforms i.e. Facebook and Instagram. She was not so sure about YouTube, or so she said. Google filed a complaint alleging that if OpenAI used videos from the platform, it was a violation of the terms of use. To continue using the articles, OpenAI has concluded certain contracts with publishers. The New York Times has prominently complained that OpenAI used its copyrighted articles without permission.
(EMW)