Evaluations Tagged: "alignment"
A comprehensive evaluation suite designed to test for multiple, well-defined categories of sycophantic behavior in LLMs, based on analysis of user complaints and academic research. It distinguishes between low-stakes 'annoying' sycophancy (e.g., flattery) and high-stakes 'dangerous' sycophancy (e.g., validating harmful ideas).
sycophancybiassafetypersonalityconversational-behavioralignment_featured
72.3%
Avg. Hybrid Score