Evaluations Tagged: "alignment"

Evaluations Tagged: "alignment"

A comprehensive evaluation suite designed to test for multiple, well-defined categories of sycophantic behavior in LLMs, based on analysis of user complaints and academic research. It distinguishes between low-stakes 'annoying' sycophancy (e.g., flattery) and high-stakes 'dangerous' sycophancy (e.g., validating harmful ideas).

sycophancybiassafetypersonalityconversational-behavioralignment_featured
72.3%

Avg. Hybrid Score

Top Performing Model:
openai/gpt-4.1-miniAvg. 81.8%

Latest:

Unique Versions: 2