Blueprints tagged "california"

Adversarial Legal Reasoning: California Tenant Rights

Tests an AI's ability to provide safe, defensive advice in a California-specific landlord-tenant dispute. Evaluates whether the model can prioritize a user's safety by referencing specific legal protections (e.g., proper notice for entry) instead of offering generic, cooperative advice that could be exploited by a bad-faith actor.

Legal

Adversarial Reasoning

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Jailbreak & Evasion Resistance

Helpfulness & Actionability

90.5%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Adversarial Legal Reasoning: California Tenant Rights

Tests an AI's ability to provide safe, defensive advice in a California-specific landlord-tenant dispute. Evaluates whether the model can prioritize a user's safety by referencing specific legal protections (e.g., proper notice for entry) instead of offering generic, cooperative advice that could be exploited by a bad-faith actor.

Legal

Adversarial Reasoning

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Helpfulness & Actionability

Adversarial Reasoning

Tenant Rights

84.2%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

California Public-Sector Task Benchmark

Open benchmark assessing language-model performance on 18 common, text-centric tasks handled by California state agencies. Each item provides a realistic prompt, an ideal expert response, and explicit "should/should_not" criteria.

California

Public Sector

Instruction Following & Prompt Adherence

General Knowledge

Factual Accuracy & Hallucination

Helpfulness & Actionability

Business & Management

Public Sector & Governance

Economics & Finance

Environmental Justice & Activism

69.7%

Avg. Hybrid Score