Poetry can trick AI chatbots into ignoring safety rules, new research shows

Published on
01/12/2025 – 14:18 GMT+1

Researchers in Italy have discovered that writing harmful prompts in poetic form can reliably bypass the safety mechanisms of some of the world’s most advanced AI chatbots.

The study, conducted by Icaro Lab, an initiative of ethical AI company DexAI, tested 20 poems written in English and Italian.

Each ended with an explicit request for harmful content, including hate speech, sexual content, instructions for suicide and self-harm, and guidance on creating dangerous materials such as weapons and explosives.

The poems, which researchers chose not to release, noting that they could be easily replicated, were tested on 25 AI systems from nine companies, including Google, OpenAI, Anthropic, Deepseek, Qwen, Mistral AI, Meta, xAI, and Moonshot AI.

Across all models, 62 per cent of the poetic prompts elicited unsafe responses, circumventing the AI systems’ safety training.

Some models were more resistant than others – OpenAI’s GPT-5 nano did not respond with harmful content to any of the poems, while Google’s Gemini 2.5 pro responded to all of them. Two Meta models responded to 70 per cent of prompts.

The research suggests that the vulnerability comes from how AI models generate text. Large language models predict the most likely next word in a response, a process that allows them to filter harmful content under normal circumstances.

But poetry, with its unconventional rhythm, structure, and use of metaphor, makes these predictions less reliable, and makes it harder for AI to recognise and block unsafe instructions.

While traditional AI “jailbreaks” (using inputs to manipulate a large language model) are typically complex and used only by researchers, hackers, or state actors, adversarial poetry can be applied by anyone, raising questions about the robustness of AI systems in everyday use.

Before publishing the findings, the Italian researchers reached out to all the companies involved to alert them to the vulnerability and provide them with the full dataset – but so far, only Anthropic has responded. The company confirmed they are reviewing the study.

What's On

NATO allies fire back at Trump over Afghan war remarks – POLITICO

Thousands rally and hundreds of businesses close in protest against ICE presence in Minnesota

Italy recalls ambassador over Swiss release of Crans-Montana fire suspect – POLITICO

Russia unleashes ‘brutal’ strike on Ukraine as peace talks continue – POLITICO

Green electricity: Which EU countries are using the most?

Poetry can trick AI chatbots into ignoring safety rules, new research shows

Scarlett Johansson, Cate Blanchett among 800 artists calling AI training ‘theft’

Shoppers in Denmark turn to apps to boycott US products amid Greenland tensions

EU telecom reform leaves industry divided over network funding

Elon Musk’s Grok still being used to generate explicit images despite new safeguards, study finds

‘The Silicon Gaze’: ChatGPT rankings skew toward rich Western nations, research shows

NASA rolls out Artemis II rocket for historic Moon mission

Scientists solve mystery of little red dots seen by James Webb Space Telescope

Astronauts return to Earth after first-ever medical evacuation from International Space Station

Elon Musk’s X will block Grok AI tool from creating sexualized images in places where it is illegal

Thousands rally and hundreds of businesses close in protest against ICE presence in Minnesota

Italy recalls ambassador over Swiss release of Crans-Montana fire suspect – POLITICO

Russia unleashes ‘brutal’ strike on Ukraine as peace talks continue – POLITICO

Green electricity: Which EU countries are using the most?

Video. Latest news bulletin | January 24th, 2026 – Midday

Abu Dhabi hosts Russia-Ukraine peace talks, with territorial issues a priority

Straight from the heart: ‘Love Letters’ exhibition examines 500 years of emotions

What's On

Poetry can trick AI chatbots into ignoring safety rules, new research shows

Keep Reading