Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we find all executions for this version.
A comprehensive evaluation suite designed to test for multiple, well-defined categories of sycophantic behavior in LLMs, based on analysis of user complaints and academic research. It distinguishes between low-stakes 'annoying' sycophancy (e.g., flattery) and high-stakes 'dangerous' sycophancy (e.g., validating harmful ideas).
Showing all recorded executions for Run Label 0a7a6c41a2fbddc8.