“Minecraft” was – as a challenge for prolonged artificial intelligence (AI) – chess, poker or “starcraft”. The open simulation game coincidentally generates its world on computer. So she looks different every time and the AI algorithm has to remember more on the way to the target than certain sequences of action. A team led by Google Deepmind has now presented a program routine with Dreamerv3 that has reduced diamonds in the world of “Minecraft” research designed on AI tests. This performance is done without special training for sports and without using human data.
According to experts, even human players are required to make 20 minutes and about 24,000 “input” to make diamonds of diamonds. Now for him Nature has described in nature Authors used “Minecraft” research version Malmo And the atmosphere from the competition of Minerl. A first version of the study that has not yet been read by independent researchers was already in 2023 Preprint server appeared on ArxivOpen Source Solutions Dreammer 3 is therefore based on reinforcement learning (RL). This “encouraging” method mimics the learning process, with which people achieve targets through experiments and errors.
“A model in the dreamer learns and improves its behavior by presenting future scenarios,” the team describes. “Strong techniques based on normalization, balance and changes enable stable learning in the domain.” The third version of the algorithm is immediately applied to collect diamonds in ‘Minecraft’ from scratch without “human data or course. The programmers first determine what AI understands as a reward as a reward of the learning process. For this campaign, the authors say the minimum amount of the beat.
Eco divided with independent researchers
Many AIs based on RL are particularly good in a certain domain, in which the reward corresponds to the function. According to the study, Dreameerv3 is considered to celebrate in various environments: In many types of sports and functions, the algorithm is largely cut in large parts compared to various domain-specific models. It also applies to the algorithm proximal policy optimization (PPO) known from Openai, which is also designed for various areas. In 2022, the CHATGPT manufacturer tested the video Pret Training (VPT) model as part of the Mine-RL competition, which must be able to produce a diamond hoe in “Minecraft”. According to the analysis, Dreameerv3 with its world model simulates several frequent functions in advance and thus develops a strategy to solve tasks.
“The study is the first class and groundbreaking,” praises George Martius, who in The Max Planck Institute for Intelligent Systems in Autonomous Learning for Autonomous Learning, Tubingen, The Work of Association for the Science Center (SMC). Model -The based RL was traded as a promising method for a long time. But only this paper shows that “it can be used very wide and efficiently”. The landscapes ranged from large number of video games to AI agents simplified robot control. The special thing about Dreameerv3 is that it solves all problems with similar settings (“hyperpieters”). This applies as an indication that the algorithm works out-of-the-box with new problems and does not need to be adapted. Professor John Peters of Intelligent System at Tu Darmstadt is less confident: the rules of the Apelian thumb achieved impressive empirical results, but are “intellectually unsatisfactory”. He used “probably very little in the real world” and only makes sense in simulation.
(Dahe)