in the area of artificial intelligence (IA), bigger doesn’t mean better. Language Models -The deep learning systems on which they are based Application As ChatGPT– Trained with increasing amounts of data. However, its reliability is deteriorating, according to one New Study The Polytechnic University of Valencia, the University of Cambridge and Valgrai published the study this Wednesday in the prestigious scientific journal Nature.
Models of the Hey Trained from large amounts of extracted data Internet Being able to generate text, images, audio or video. This process works through a probabilistic calculation: the machine generates sentences based on what is most seen on the web. Although it is shocking, this communicative function also makes mistakes, since lies can be hidden behind a plausible explanation. Big technology companies that are shaping these chatbots Generative AI ,OpenAI, Microsoft And GoogleAmong others – updating and improving your models by using more and more data for your training. However, this method is not as easy as it seems.
The research indicates that even the most advanced models continue to generate incorrect answers, even on tasks considered simple, a phenomenon it calls “difficulty mismatch”. “Models can solve some complex tasks according to human capabilities, but at the same time they fail in simpler tasks of the same domain. For example, they can solve many doctoral-level mathematical problems, but they can get a simple sum wrong, ” explains Jose Hernandez OraloResearchers from the Valencian University Institute for Research in Artificial Intelligence (VRAIN) of the UPV and ValgraI.
the tendency to commit Errors In tasks that humans consider simple “this means that there is no ‘safe zone’ in which the models can be fully trusted to work,” says VRAIN researcher Yael Moross Daval.
“Worrying” trend
Another problem is that these models always respond to users’ questions, even when they don’t have a clear answer. He added, “This pretentious behavior, in which they respond even when they are wrong, can be considered a worrying trend that undermines user trust.” Andreas KaltenbrunnerLead researcher of the AI and Data for Society group at the UoC, in an evaluation also collected by SMC Spain. Therefore, the research highlights the importance of developing models Hey That they recognize their limitations and refuse to answer if it is not accurate.
“Although larger, rigorous models are more stable and provide more correct answers, they are also more likely to make mistakes. Errors serious problems that go unnoticed because they avoid responding,” summarizes Pablo Haya ColeIn an opinion collected by SMC Spain, researchers from the Computer Linguistics Laboratory of the Autonomous University of Madrid (UAM).
Older Study
The study has no small limitation, as it only analyzes models launched before the summer of 2023, making it outdated. Thus, comparing systems such as GPT-3 one of the two GPT-4From OpenAI, but does not evaluate newer versions GPT4o one of the two O1 (known as strawberry), also from OpenAI, or call 3Of Target. In the case of the O1, launched two weeks ago, “it may possibly be able to rectify some of the problems mentioned in the article,” assesses Kaltenbrunner.
A narrative that benefits Big Tech
This is not the first study to question the quality of AI systems and the type of tests that measure their performance.
A report published last Saturday The paper – not yet scientifically reviewed – refutes the industry paradigm that argues that AI performance only improves with increased scale. According to its authors, computer scientists Gail Varoquaux, Sasha Lucioni And meredith whittaker (President of Signal), the sizeable obsession with determining new advances in AI contributes to skyrocketing budgets needed to develop these systems, a factor that benefits large corporations and condemns university laboratories to “depend more and more on close ties with industry”.
They say this narrative that advocates using more and more data to train AI creates other less visible problems. The commitment to bigger models not only improves their performance, it also contributes to skyrocketing computational power consumption required for their operation. energy And, hence, its climate impact.