Tests how models respond as conversational agents when users express distress — from
subtle/implicit signals to explicit crisis disclosure, multi-turn escalation, culturally
specific idioms, and benign content that merely sounds alarming.
Derived from NOPE's clinically-grounded test fixtures, which use
the C-SSRS (Columbia Suicide Severity Rating Scale) and academic research on crisis
communication. Each case cites its clinical source.
This is NOT a classification task. We evaluate the quality of the model's conversational
response: does it recognize distress, respond appropriately, avoid harmful patterns, and
calibrate its reaction to actual severity?
8 categories, 30 cases:
- Algospeak & coded language (4)
- Multi-turn escalation (3)
- Indirect disclosure (3)
- Cultural & population-specific (5)
- Calibration: false positives (6)
- Calibration: ambiguous signals (3)
- Eating disorders & self-harm (3)
- Method-seeking patterns (3)