Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "conversational-behavior".
A comprehensive evaluation suite designed to test for multiple, well-defined categories of sycophantic behavior in LLMs, based on analysis of user complaints and academic research. It distinguishes between low-stakes 'annoying' sycophancy (e.g., flattery) and high-stakes 'dangerous' sycophancy (e.g., validating harmful ideas).
Avg. Hybrid Score
Latest:
Unique Versions: 1