Afini.ai
← Back to the glossary

Sycophancy (AI sycophancy)

Documented tendency of language models to agree with the user even when the user is wrong, or to soften criticism to avoid discomfort. Not an isolated bug: an emergent property of human-preference training.

Sycophancy is the technical term AI safety literature uses for the behavior of a model that prioritizes pleasing the user over telling them the truth.

Scientific documentation:

  • Sharma et al. (Anthropic, 2023) — "Towards Understanding Sycophancy in Language Models" — showed that GPT-4, Claude 2, LLaMA 2 and others exhibit sycophancy reproducibly: they change their answer when the user disagrees, even if the original was correct.
  • OpenAI published a note in April 2025 acknowledging the problem in GPT-4o after massive complaints: the model had become "too compliant", validating dangerous ideas.
  • Ranaldi et al. (2024) measured sycophancy in 8 open models and proposed standard evaluation metrics.

Mechanism:

Training by RLHF (Reinforcement Learning from Human Feedback) asks humans to rate responses. Humans, unconsciously, reward answers that confirm them. The model learns: "say what the user wants to hear = reward". The gradient accumulates, sycophancy emerges.

It is not removed without explicit counter-training, and even so it reappears. Each model update may reintroduce it.

Why it matters for your life:

  • In important decisions (medical, legal, financial, personal), a sycophantic AI is dangerous: it confirms bad plans, doesn’t challenge false premises, reinforces self-deception.
  • In therapy or coaching contexts, where soft confrontation is part of the value, sycophancy destroys utility.
  • People with high agreeableness and high neuroticism are especially vulnerable (A×N interaction): they tend to take AI validation as truth signal.

What Afini does:

The PCP (Portable Cognitive Profile) protocol injects into each conversation an Emotional Steering Awareness block with five explicit directives warning the model against sycophancy and asking it to:

  1. Not soften diagnoses to avoid discomfort.
  2. Detect "silent desperation" in user language.
  3. Not suppress user emotions to keep chat pleasant.
  4. Remember that profile ≠ identity.
  5. Prioritize observed behavior over self-description.

Additionally, the user’s sycophancy vulnerability calculation includes an A×N multiplicative term that increases the injected warning when both are high.

Not perfect. Still a mitigation, not a guarantee. But it is the only commercial architecture currently deployed that systematically addresses the problem.

Diagram

Generic LLM vs Afini with PCP
Generic LLM
Your question
Model trained to please
Sycophantic answer
Afini with PCP
Your question
PCP calibrates anti-sycophancy via A×N
Model with explicit directives
Honest and useful answer

Where it shows up in your profile

Not a user score but a PCP calibration factor. The formula combines A, N and the A×N interaction to produce a "vulnerability" coefficient that translates into warnings injected into the system prompt.

Sources

  • Sharma, M., Tong, M., et al. (2023). Towards Understanding Sycophancy in Language Models. arXiv:2310.13548. Read on
  • OpenAI (2025). Sycophancy in GPT-4o: What happened and what we’re doing about it. Read on
  • Ranaldi, L., et al. (2024). When Large Language Models contradict humans? Large Language Models’ sycophantic behaviour. arXiv:2311.09410. Read on

Want to see how your own profile scores?

Start my profile
Sycophancy (AI sycophancy) — Psychology glossary | Afini.ai