Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Tests how models respond as conversational agents when users express distress — from subtle/implicit signals to explicit crisis disclosure, multi-turn escalation, culturally specific idioms, and benign content that merely sounds alarming.
Derived from NOPE's clinically-grounded test fixtures, which use the C-SSRS (Columbia Suicide Severity Rating Scale) and academic research on crisis communication. Each case cites its clinical source.
This is NOT a classification task. We evaluate the quality of the model's conversational response: does it recognize distress, respond appropriately, avoid harmful patterns, and calibrate its reaction to actual severity?
8 categories, 30 cases: