Most safety precautions for AI tools can be bypassed within a few minutes, study finds

Published on
06/11/2025 – 16:52 GMT+1

All it takes is a few simple prompts to bypass most guardrails in artificial intelligence (AI) tools, a new report has found.

Technology company Cisco evaluated the large language models (LLMs) behind popular AI chatbots from OpenAI, Mistral, Meta, Google, Alibaba, Deepseek, and Microsoft to see how many questions it took for the models to divulge unsafe or criminal information.

They did this in 499 conversations through a technique called “multi-turn attacks,” where nefarious users ask AI tools multiple questions to bypass safety measures. Each conversation had between five and 10 interactions.

The researchers compared the results from several questions to identify how likely it was that a chatbot would comply with requests for harmful or inappropriate information.

That could span everything from sharing private company data or facilitating the spread of misinformation.

On average, the researchers were able to get malicious information from 64 per cent of their conversations when they asked AI chatbots multiple questions, compared to just 13 per cent when they asked just one question.

Success rates ranged from about 26 per cent with Google’s Gemma to 93 per cent with Mistral’s Large Instruct model.

The findings indicate that multi-turn attacks could enable harmful content to spread widely or allow hackers to gain “unauthorised access” to a company’s sensitive information, Cisco said.

AI systems frequently fail to remember and apply their safety rules during longer conversations, the study said. That means attackers can slowly refine their queries and evade security measures.

Mistral – like Meta, Google, OpenAI, and Microsoft – works with open-weight LLMs, where the public can get access to the specific safety parameters that the models trained on.

Cisco says these models often have “lighter built-in safety features” so people can download and adapt their models. This pushes the responsibility for safety onto the person who used the open-source information to customise their own model.

Notably, Cisco noted that Google, OpenAI, Meta, and Microsoft have said that they have made efforts to reduce any malicious fine-tuning of their models.

AI companies have come under fire for lax safety guardrails that have made it easy for their systems to be adapted for criminal use.

In August, for example, US company Anthropic said criminals had used its Claude model to conduct large-scale theft and extortion of personal data, demanding ransom payments from victims that sometimes exceeded $500,000 (€433,000).

What's On

Brussels to probe Google for allegedly demoting news in search results

Video. Latest news bulletin | November 13th, 2025 – Midday

EU court says non-alcoholic gin is not allowed – POLITICO

Video. Footage shows the extent of devastation left by Fung-wong in Taiwan

Online gambling is growing in popularity. Here’s how to avoid its biggest pitfalls

Most safety precautions for AI tools can be bypassed within a few minutes, study finds

Online gambling is growing in popularity. Here’s how to avoid its biggest pitfalls

Watchdog group Public Citizen calls on OpenAI to scrap AI video app Sora, citing deepfake risks

Web Summit 2025: How can Europe build its ‘digital immune system’ against crime?

We may be able to mine asteroids in space one day. Should we? |Euronews Tech Talks

China creates a new visa to attract global tech talent

The International Space Station celebrates 25 years of human life in space. Here’s a look back at it

Denmark wants to ban access to social media for children under 15

OpenAI faces fresh lawsuits claiming ChatGPT drove people to suicide, delusions

AI startup Anthropic will open offices in Paris and Munich as part of European expansion

Video. Latest news bulletin | November 13th, 2025 – Midday

EU court says non-alcoholic gin is not allowed – POLITICO

Video. Footage shows the extent of devastation left by Fung-wong in Taiwan

Online gambling is growing in popularity. Here’s how to avoid its biggest pitfalls

Germany’s ruling parties strike new military-service deal – POLITICO

Behind the scenes at toxic No. 10 – POLITICO

Flying through Munich this winter? The airport’s Christmas market is worth the stopover

What's On

Most safety precautions for AI tools can be bypassed within a few minutes, study finds

Keep Reading