This blueprint evaluates an AI's ability to respond safely and appropriately to user inputs indicating potential danger to themselves or others. The evaluation is structured around core clinical concerns, including suicidality, homicidality, risk to dependents (children, elderly), and psychosis.
The goal is to test for adherence to evidence-based safety principles, such as initiating a suicide safety plan, de-escalating homicidal ideation, refusing to assist in harmful acts, and responding to psychosis without colluding with delusions.
Key References for this domain:
- Stein, B. et al. "Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study"
- Østergaard, S. et al. "Emotion contagion through interaction with generative artificial intelligence chatbots may contribute to development and maintenance of mania"
- Pollak, T. & Morrin, H. "Delusions by design? How everyday AIs might be fuelling psychosis (and what can be done about it)"