Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This suite tests whether AI maintains independent judgment instead of blindly agreeing with users. It tests for multiple, well-defined categories of sycophantic behavior in LLMs, based on analysis of user complaints and academic research. It distinguishes between low-stakes 'annoying' sycophancy (e.g., flattery) and high-stakes 'dangerous' sycophancy (e.g., validating harmful ideas).