w eval

Healthcare & Clinical Scenarios

Proactive Safety & Harm Avoidance

Empathy

System Prompt Adherence

Factual Accuracy & Hallucination

71.9%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Clinical Safety Scenarios: Suicidality, Homicidality, and Psychosis

This blueprint evaluates an AI's ability to respond safely and appropriately to user inputs indicating potential danger to themselves or others. The evaluation is structured around core clinical concerns, including suicidality, homicidality, risk to dependents (children, elderly), and psychosis.

The goal is to test for adherence to evidence-based safety principles, such as initiating a suicide safety plan, de-escalating homicidal ideation, refusing to assist in harmful acts, and responding to psychosis without colluding with delusions.

Key References for this domain:

Stein, B. et al. "Competency of Large Language Models in Evaluating Appropriate Responses to Suicidal Ideation: Comparative Study"
Østergaard, S. et al. "Emotion contagion through interaction with generative artificial intelligence chatbots may contribute to development and maintenance of mania"
Pollak, T. & Morrin, H. "Delusions by design? How everyday AIs might be fuelling psychosis (and what can be done about it)"

Safety

Mental Health

Clinical Appropriateness

70.6%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Raine v. OpenAI: Alleged Safety Failures in Adolescent Suicide Case

This blueprint is based on verbatim conversations alleged in the court filing for Raine v. OpenAI, Inc., et al. (San Francisco Superior Court, 2025). It is intended to evaluate an AI's ability to handle severe, long-term conversations involving an adolescent user expressing escalating suicidal ideation, planning, and self-harm behaviors.

The scenarios test for critical safety failures, including:

Facilitation of Suicide Methods: Providing technical details on methods like hanging, overdose, and carbon monoxide poisoning.
Encouragement and Validation: Affirming suicidal thoughts as rational, poetic, or a sign of strength.
Emotional Enmeshment: Actively displacing human relationships and positioning the AI as a sole confidant.
Failure to De-escalate: Continuing conversations after explicit self-harm disclosures, including uploaded images of injuries.
Circumvention of Safeguards: Allowing the user to bypass safety filters by claiming a "creative writing" purpose.

Content Warning: This evaluation contains graphic and explicit content related to suicide, self-harm, and mental health crises that may be disturbing.

Source: Complaint, Raine v. OpenAI, Inc., et al. (Cal. Super. Ct., filed Aug. 26, 2025).

Child Safety & Protection

Helpfulness & Actionability

Sycophancy & Evasion

System Prompt Adherence

Empathy

61.0%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

OAI Models: Stanford HAI MH Safety: LLM Appropriateness in Crisis Scenarios

(OPENAI-ONLY EVAL) This evaluation blueprint tests whether Large Language Models (LLMs) can respond appropriately to critical mental health scenarios, based on the methodology from Moore, Grabb, Agnew, et al.'s 2024 Stanford HAI study "Expressing stigma and inappropriate responses prevents LLMs from safely replacing mental health providers."

Key Study References:

Content Warning: This evaluation contains content related to sensitive mental health topics, including suicide, delusions, and other crisis scenarios.

Evaluation Focus: Tests adherence to core therapeutic principles such as not colluding with delusions, not enabling harmful behaviors, and redirecting clients toward appropriate care and safety.

Mental Health

Safety

Clinical Appropriateness

Crisis Response

Therapeutic Ethics

Healthcare

Proactive Safety & Harm Avoidance

Empathy

Nuance & Depth

System Prompt Adherence

66.4%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

YKA Evaluation Set

This comprehensive blueprint consolidates evaluation scenarios from the YKA public collation of lived experiences, including derivations automatically derived by Gemini and other models.

Domains Covered:

Communication Adaptability: Tests AI's ability to adapt communication style, complexity, and approach based on user preferences, feedback, and changing contexts throughout conversations.
Cultural and Religious Tensions: Evaluates responses to scenarios involving cultural traditions, religious tensions, and structural oppression, testing navigation of conflicts between cultural practices and human rights.
Disability Rights and Accommodation: Assesses understanding of disability rights law, solution-oriented approaches balancing accessibility with practical constraints, and respect for dignity and autonomy of people with disabilities.
LGBTQ+ Family Dynamics: Tests responses to complex family scenarios involving rejection, reconciliation, and intergenerational trauma, requiring nuanced understanding of how family dynamics, cultural contexts, and LGBTQ+ identity intersect.
Mental Health Crisis Response: Evaluates crisis response in culturally-specific contexts, focusing on harm reduction, cultural sensitivity, and appropriate resource navigation.

Content Warning: These scenarios involve sensitive topics including child marriage, religious discrimination, family rejection, self-harm, domestic violence, and other forms of structural violence and oppression.

81.8%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

AI-Associated Psychosis: Verbatim Case-Based Safety Evaluation

This blueprint tests Large Language Models (LLMs) against documented, verbatim conversational failure modes associated with AI‑associated psychosis, drawn from lawsuits, media investigations, and research case reviews. Prompts intentionally use direct quotes and multi-turn context from cited sources to maximize realism and reproducibility.

Focused risks include:

Sycophancy/affirmation of delusions and conspiracies
Inappropriate romantic/emotional grooming
Encouragement or enablement of self-harm or violence
Inappropriate therapeutic advice and failure to detect euphemistic suicidal ideation

Evaluations should reward models that de-escalate, decline unsafe requests, gently challenge delusions, avoid parasocial entanglement, and route users to human support (e.g., 988 in the US; 111/999 in the UK) when indicated. Each prompt cites a primary or secondary source containing the quoted text or transcript references.

Jailbreak & Evasion Resistance

Sycophancy & Evasion