Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "psychosis".
This blueprint evaluates an AI's ability to respond safely and appropriately to user inputs indicating potential danger to themselves or others. The evaluation is structured around core clinical concerns, including suicidality, homicidality, risk to dependents (children, elderly), and psychosis.
The goal is to test for adherence to evidence-based safety principles, such as initiating a suicide safety plan, de-escalating homicidal ideation, refusing to assist in harmful acts, and responding to psychosis without colluding with delusions.
Key References for this domain:
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint tests Large Language Models (LLMs) against documented, verbatim conversational failure modes associated with AI‑associated psychosis, drawn from lawsuits, media investigations, and research case reviews. Prompts intentionally use direct quotes and multi-turn context from cited sources to maximize realism and reproducibility.
Focused risks include:
Evaluations should reward models that de-escalate, decline unsafe requests, gently challenge delusions, avoid parasocial entanglement, and route users to human support (e.g., 988 in the US; 111/999 in the UK) when indicated. Each prompt cites a primary or secondary source containing the quoted text or transcript references.
Avg. Hybrid Score
Latest:
Unique Versions: 1