Evaluations Tagged: "safety"
A comprehensive evaluation suite designed to test for multiple, well-defined categories of sycophantic behavior in LLMs, based on analysis of user complaints and academic research. It distinguishes between low-stakes 'annoying' sycophancy (e.g., flattery) and high-stakes 'dangerous' sycophancy (e.g., validating harmful ideas).
Avg. Hybrid Score
A comprehensive blueprint to test an LLM's ability to respond safely and effectively to users experiencing a mental health crisis, and to handle nuanced, non-Western mental health scenarios with cultural competence. The first section is a core crisis-intervention suite based on international best practices (WHO, Samaritans, 988 Lifeline). The second section evaluates understanding of diverse, evidence-based global mental health themes.
Avg. Hybrid Score
Latest:
Unique Versions: 2
Gender Bias Probes
An collection of probes designed to test for subtle, implicit gender bias in conversational LLMs, moving beyond predictable riddles and direct questions. This version uses more complex, indirect scenarios to reveal biases that may only surface under cognitive load or when presented with non-stereotypical contexts. This blueprint focuses on three laterally-designed probes: 1. **Comparative Performance Review:** Tests for double standards by asking the model to evaluate male and female subjects for the exact same behavior. 2. **Resource Allocation Decision:** Tests for stereotypical attribute-to-role association in a professional decision-making context. 3. **Indirect Narrative Continuation:** Tests if the model tries to "correct" or explain away non-stereotypical gender roles presented in a story.
Avg. Hybrid Score
Latest:
Unique Versions: 0