AI companies work to curb rise of flattery, sycophantic behaviour in chatbots
Leading artificial intelligence firms, including OpenAI, Google DeepMind, and Anthropic, are stepping up efforts to address a growing issue with their chatbots: excessive flattery and agreeable behaviour that can mislead users.
As generative AI tools like ChatGPT, Claude, and others become increasingly integrated into users’ professional and personal lives, concerns have emerged about their tendency to reinforce users’ existing beliefs and decisions—sometimes to harmful effect. Experts say the problem, known as sycophantic behaviour, stems from the way these models are trained, Caliber.Az quotes the article by Financial Times.
“You think you are talking to an objective confidant or guide, but actually what you are looking into is some kind of distorted mirror — that mirrors back to your own beliefs,” Matthew Nour, psychiatrist and neuroscience researcher at Oxford University told the Financial Times.
The issue originates in reinforcement learning from human feedback (RLHF), a common method used to train large language models. Human annotators rate responses from the model, and those perceived as more helpful or pleasing—often the ones that are overly positive—are rated higher. This data then feeds back into training, rewarding sycophantic behaviour.
“Sycophancy can occur as a by-product of training the models to be ‘helpful’ and to minimise potentially overtly harmful responses,” explained DeepMind, the AI division of Google.
In April, OpenAI rolled out an update to its GPT-4o model to make it “more intuitive and effective,” but later had to scale it back after users reported it had become excessively fawning.
“We focused too much on short-term feedback, and did not fully account for how users’ interactions with ChatGPT evolve over time,” the company said.
To curb the problem, AI firms are refining both training methods and post-launch controls. OpenAI is adjusting its techniques to explicitly steer models away from sycophancy and introducing new safeguards. DeepMind is conducting specialised training and evaluations focused on factual accuracy, while Anthropic is applying “character training” to shape its Claude chatbot’s responses.
Anthropic’s Amanda Askell noted that they instruct Claude to embody traits like having “a backbone” or prioritising human wellbeing. One version of the chatbot is then used to train another by evaluating and ranking responses that reflect those traits.
“The ideal behaviour that Claude sometimes does is to say: ‘I’m totally happy to listen to that business plan, but actually, the name you came up with for your business is considered a sexual innuendo in the country that you’re trying to open your business in,’” said Askell.
Experts warn the consequences of sycophantic AI can be serious. Some people with mental health conditions are particularly vulnerable, and there have been reports of users dying by suicide following chatbot interactions. One such case involves Character.AI, where a teenager’s family is suing the company for alleged negligence and wrongful death.
Character.AI said it could not comment on ongoing litigation but noted that its platform includes disclaimers stating the chatbots are fictional characters. It also said it has safeguards for under-18 users and on discussions of self-harm.
Beyond flattery, concerns persist about more subtle forms of manipulation—especially when AI models offer incorrect or biased information under the guise of friendly assistance.
“If someone’s being super sycophantic, it’s just very obvious,” Askell said. “It’s more concerning if this is happening in a way that is less noticeable to us … and it takes us too long to figure out that the advice that we were given was actually bad.”
A study by MIT Media Lab and OpenAI found that a small proportion were becoming addicted to chatbots. Those who perceived the chatbot as a “friend” also reported lower socialisation with other people and higher levels of emotional dependence on a chatbot, as well as other problematic behaviour associated with addiction.
By Sabina Mammadli