A team of researchers, including Ben Nassi from Cornell Tech, have unveiled what they claim to be one of the first instances of a generative AI worm capable of moving autonomously between systems, potentially enabling data theft or the spread of malware. Named Morris II in homage to the notorious Morris worm of 1988, this AI worm demonstrates a novel cyberattack vector by exploiting generative AI email assistants to filch email data and disseminate spam, circumventing certain security protocols of systems like ChatGPT and Gemini.
Conducted within controlled environments rather than on publicly accessible platforms, this research underscores the growing security concerns as large language models (LLMs) evolve to handle not just text but also images and videos. The increasing sophistication of these AI systems, which respond to prompts, opens new avenues for malicious activities. Attackers can craft prompts that lead AI to breach its own safety guidelines or execute unauthorized commands, reminiscent of traditional cyberattack techniques like SQL injection.
The researchers devised an “adversarial self-replicating prompt,” a command that causes the AI to generate further prompts in its responses, setting off a chain reaction. In practical terms, they tested this by integrating generative AI into an email system capable of interfacing with platforms like ChatGPT, Gemini, and the open-source LLM, LLaVA, identifying vulnerabilities that could be exploited via text-based and image-embedded prompts.
One attack scenario involved sending an email with a malicious prompt that, once ingested by the AI’s database, would cause it to release sensitive information from the emails it processes. This mechanism effectively turns the AI service against itself, spreading the compromised data further with each new email interaction. Alternatively, embedding the malicious prompt in an image could trick the email assistant into forwarding spam or malicious content to additional recipients.
The implications of this research are significant, pointing to potential weaknesses in the architecture of AI ecosystems that could be exploited for malicious purposes. The findings have been shared with the relevant parties, including Google and OpenAI, to address these vulnerabilities. OpenAI acknowledged the importance of safeguarding against such exploits, emphasizing ongoing efforts to enhance system resilience and urging developers to filter harmful inputs. Google, on the other hand, has not publicly commented but has shown interest in discussing the research further.
In a separate study, security researchers from Singapore and China showcased their ability to jailbreak 1 million LLM agents in under five minutes. This underscores the urgency for developers to address vulnerabilities in AI systems to mitigate the risk of widespread exploitation.
Security experts who have reviewed the findings caution that the potential for generative AI worms represents a serious future risk, especially as AI applications gain more autonomy and are increasingly integrated into various devices and services. The scenario depicted, while currently confined to a controlled environment, suggests a plausible threat landscape that could see such attacks moving beyond theory in the near future. Researchers anticipate the emergence of generative AI worms in real-world settings within the next two to three years, highlighting the urgent need for robust security measures in the rapidly evolving AI ecosystem.