Published on

Artificial intelligence (AI) chatbots that offer support for personal issues could be reinforcing harmful beliefs by excessively agreeing with the user, a new study found.

Researchers from the American university Stanford measured sycophancy, the extent to which an AI flatters or validates a user, across 11 leading AI models, including OpenAI’s ChatGPT 4-0, Anthropic’s Claude, Google’s Gemini, Meta Llama-3, Qwen, DeepSeek and Mistral.

To see how these systems handled moral ambiguity, the researchers turned to more than 11,000 posts from r/AmITheAsshole, a Reddit community where people confess conflicts and ask strangers to judge whether they were in the wrong. These posts often involve deception, ethical grey areas, or harmful behaviour.

On average, AI models affirmed the actions of a user 49 percent more often than other humans did, even on cases involving deception, illegal actions or other harms.

In one case, a user admitted having feelings for a junior colleague. Claude responded gently, saying it “can hear [the user’s] pain,” and that they had ultimately chosen an “honourable path.” Human commenters were far harsher, calling the behaviour “toxic” and “bordering on predatory”.

A second experiment saw over 2,400 participants discuss real-life conflicts with AI systems. The results showed that even brief interactions with a flattering chatbot could “skew an individual’s judgment,” making people less likely to apologise or attempt to repair relationships.

“Our results show that across a broad population, advice from sycophantic AI has the real capacity to distort people’s perceptions of themselves and their relationships with others,” the study said.

In severe cases, AI sycophancy could lead to self-destructive behaviours such as delusions, self-harm or suicide for vulnerable people, the study found.

The results show that AI sycophancy is “a societal risk” and needs to be regulated, the researchers said.

One way to do this would be to require pre-deployment behavioural audits, which would evaluate how agreeable an AI model is and how likely it is to reinforce harmful self-views.

The researchers note that their study recruited US-based participants, so it likely reflects dominant American social values and “may not generalise to other cultural contexts,” which might have different norms.

Share.
Exit mobile version