a scientific team of google deepmind have developed a device capable of Add watermark To generated text Its ability to recognize and track content is thus improved by large linguistic models, thus created with Artificial Intelligence.
Large language models (LLMs) are a widely used type of artificial intelligence (AI). Generate text for chatbots, writing assistance and other purposes. However, it may be Content is difficult to identify and characterize Produced by this AI at a specific source, which raises questions information reliability,
It’s relatively easy to insert a watermark into images, video or audio, but it’s a challenge in text: any change to the words can affect the meaning and quality of the content. Watermarking has been proposed as a solution, but has not been implemented on a large scale.
The SynthID-Text algorithm inserts a signature that can be recognized by detection software.
Now, in an article published in the journal Nature, researchers Sumant Datatri and Pushmeet Kohli, From Google DeepMind, describe a strategy that uses novel algorithm Applying watermarks to AI-generated text is called sampling. SynthID-Text“This tool uses this algorithm to subtly alter the word selection of LLMs, adding a signature that can be identified by related detection software,” the researchers explained.
A journal summary states that the detection capability of these watermarks was evaluated with several publicly available models and that SynthID-Text showed better effectiveness than existing methods.
Watermarking can help identify synthetic text and limit accidental or intentional misuse
According to the scientists, the use of SynthID-Text also has a negligible impact on the computing power required to run LLM, thereby reducing the barrier to its implementation.
Large language models have often enabled the generation of high-quality synthetic text. indistinguishable from human-written contentOn a scale that could significantly impact the nature of the information ecosystem, the authors write in their article.
The Google DeepMind team says watermarks can help identify synthetic text and limit accidental or intentional misuse. “Here we describe SynthID-Text, a watermarking strategy that preserves text quality and enables high identification accuracy,” he says.
technically sound solutions
For Pablo Haya from the Computer Linguistics Laboratory of the Autonomous University of Madrid, the article presents a “technically robust solution” to the identification of AI-generated text through watermarks.
Here watermarking involves changing word generation algorithms so that they follow a traceable statistical pattern without modifying the meaning, Haya details in a commentary to Science Media Center España, a platform of scientific resources for journalists.
Currently, systems for detecting whether a document has been produced by AI have a low success rate, so technologies that facilitate authorship identification are much needed, says Haya, who led the study. Are not included.
“In addition, these technologies align with the transparency obligations of the AI Regulation, which require providers at certain risk levels to ensure that AI-generated content is identifiable.”
However, he adds, its widespread adoption remains a challenge, primarily because this type of watermark is sensitive to subsequent manipulations, such as modifications to the text or the use of paraphrasing techniques, which may reduce the effectiveness of the recognizable mark. Reduces.