With Fugatto, Nvidia has introduced an AI technology for generating audio that is considered significantly more versatile and better than all competing services. For example, it should be possible to alter existing audio recordings and, for example, convert a piece of piano playing into singing. It is also possible to modify a voice recording so that the accent or mood of the person being recorded changes. Nvidia’s Brian Catanzaro explains that this technology is for music production, computer game development, and “for normal people who want to build things.”
Advertisement
According to Nvidia, Fugatto (Foundational Generative Audio Transformer Opus 1) was trained exclusively with material under an open source license; The technology is controlled with the help of text commands (“prompts”) or audio files. In a video, Nvidia shows how Fugato responds to such a signal by generating the sound of a passing train, which turns into an orchestral recording. In other instances, the technique isolates a voice from a song and produces another voice that repeats a given sentence. Instruments can also be added to an uploaded music piece.
“We wanted to create a model that understands and produces sound like humans do,” Nvidia’s Rafael Valle explains the productAbout a dozen people worked on the development. According to Reuters news agency There is still internal debate about whether the technology will be made publicly available. Every generic technology brings with it some risk, Catanzaro explains reluctantly: “We have to be careful with it and that’s why we have no immediate plans to publish it.”
(mho)