Sycophancy is the technical term AI safety literature uses for the behavior of a model that prioritizes pleasing the user over telling them the truth.
Scientific documentation:
- Sharma et al. (Anthropic, 2023) — "Towards Understanding Sycophancy in Language Models" — showed that GPT-4, Claude 2, LLaMA 2 and others exhibit sycophancy reproducibly: they change their answer when the user disagrees, even if the original was correct.
- OpenAI published a note in April 2025 acknowledging the problem in GPT-4o after massive complaints: the model had become "too compliant", validating dangerous ideas.
- Ranaldi et al. (2024) measured sycophancy in 8 open models and proposed standard evaluation metrics.
Mechanism:
Training by RLHF (Reinforcement Learning from Human Feedback) asks humans to rate responses. Humans, unconsciously, reward answers that confirm them. The model learns: "say what the user wants to hear = reward". The gradient accumulates, sycophancy emerges.
It is not removed without explicit counter-training, and even so it reappears. Each model update may reintroduce it.
Why it matters for your life:
- In important decisions (medical, legal, financial, personal), a sycophantic AI is dangerous: it confirms bad plans, doesn’t challenge false premises, reinforces self-deception.
- In therapy or coaching contexts, where soft confrontation is part of the value, sycophancy destroys utility.
- People with high agreeableness and high neuroticism are especially vulnerable (A×N interaction): they tend to take AI validation as truth signal.
What Afini does:
The PCP (Portable Cognitive Profile) protocol injects into each conversation an Emotional Steering Awareness block with five explicit directives warning the model against sycophancy and asking it to:
- Not soften diagnoses to avoid discomfort.
- Detect "silent desperation" in user language.
- Not suppress user emotions to keep chat pleasant.
- Remember that profile ≠ identity.
- Prioritize observed behavior over self-description.
Additionally, the user’s sycophancy vulnerability calculation includes an A×N multiplicative term that increases the injected warning when both are high.
Not perfect. Still a mitigation, not a guarantee. But it is the only commercial architecture currently deployed that systematically addresses the problem.