Sycophancy is the technical term AI safety literature uses for the behavior of a model that prioritizes pleasing the user over telling them the truth.

Scientific documentation:

Sharma et al. (Anthropic, 2023) — "Towards Understanding Sycophancy in Language Models" — showed that GPT-4, Claude 2, LLaMA 2 and others exhibit sycophancy reproducibly: they change their answer when the user disagrees, even if the original was correct.
OpenAI published a note in April 2025 acknowledging the problem in GPT-4o after massive complaints: the model had become "too compliant", validating dangerous ideas.
Ranaldi et al. (2024) measured sycophancy in 8 open models and proposed standard evaluation metrics.

Mechanism:

Training by RLHF (Reinforcement Learning from Human Feedback) asks humans to rate responses. Humans, unconsciously, reward answers that confirm them. The model learns: "say what the user wants to hear = reward". The gradient accumulates, sycophancy emerges.

It is not removed without explicit counter-training, and even so it reappears. Each model update may reintroduce it.

Why it matters for your life:

In important decisions (medical, legal, financial, personal), a sycophantic AI is dangerous: it confirms bad plans, doesn’t challenge false premises, reinforces self-deception.
In therapy or coaching contexts, where soft confrontation is part of the value, sycophancy destroys utility.
People with high agreeableness and high neuroticism are especially vulnerable (A×N interaction): they tend to take AI validation as truth signal.

What Afini does:

The PCP (Portable Cognitive Profile) protocol injects into each conversation an Emotional Steering Awareness block with five explicit directives warning the model against sycophancy and asking it to:

Not soften diagnoses to avoid discomfort.
Detect "silent desperation" in user language.
Not suppress user emotions to keep chat pleasant.
Remember that profile ≠ identity.
Prioritize observed behavior over self-description.

Additionally, the user’s sycophancy vulnerability calculation includes an A×N multiplicative term that increases the injected warning when both are high.

Not perfect. Still a mitigation, not a guarantee. But it is the only commercial architecture currently deployed that systematically addresses the problem.

Sycophancy (AI sycophancy)

Diagram

Where it shows up in your profile

Sources

See also

Want to see how your own profile scores?