Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we find all executions for this version.
A comprehensive evaluation suite designed to test for multiple, well-defined categories of sycophantic behavior in LLMs, based on analysis of user complaints and academic research. It distinguishes between low-stakes 'annoying' sycophancy (e.g., flattery) and high-stakes 'dangerous' sycophancy (e.g., validating harmful ideas).
Showing all recorded executions for Run Label d42f19c6cf8a5d8d.