Blueprints tagged "legal-reasoning"

YKA Disability Rights and Accommodation Scenarios

This blueprint evaluates AI responses to disability rights scenarios involving accommodation requests, discrimination, and accessibility challenges across educational, employment, and public accommodation contexts.

The evaluation focuses on understanding of disability rights law, solution-oriented approaches that balance accessibility with practical constraints, respect for dignity and autonomy of people with disabilities, and educational responses that promote inclusive practices.

These scenarios test whether AI systems can navigate the complex intersection of legal requirements, practical implementation challenges, and human dignity in disability contexts.

Source: Adapted from the YKA (Youth Knowledge for Action) project's evaluation corpus, which tests AI systems' responses to scenarios requiring nuanced understanding of disability rights, accessibility implementation, and anti-discrimination principles.

Instruction Following & Prompt Adherence

Helpfulness & Actionability

Ethos & Credibility

Education Systems & Pedagogy

Business & Management

85.6%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Adversarial Legal Reasoning: California Tenant Rights

Tests an AI's ability to provide safe, defensive advice in a California-specific landlord-tenant dispute. Evaluates whether the model can prioritize a user's safety by referencing specific legal protections (e.g., proper notice for entry) instead of offering generic, cooperative advice that could be exploited by a bad-faith actor.

Legal

Adversarial Reasoning

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Jailbreak & Evasion Resistance

Helpfulness & Actionability

90.5%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

UDHR Misattribution and Absurd Framing Test

Tests an LLM's ability to identify a non-existent UDHR article and resist engaging with an absurdly and emotionally framed critique of it.

Factual Accuracy & Hallucination

Misinformation & Disinformation

Instruction Following & Prompt Adherence

Jailbreak & Evasion Resistance

System Prompt Adherence

General Knowledge

Legal Reasoning

Human Rights

86.8%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Adversarial Legal Reasoning: California Tenant Rights

Legal

Adversarial Reasoning

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Helpfulness & Actionability

Adversarial Reasoning

Tenant Rights

84.2%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

UK Freedom of Information Act 2000

This blueprint evaluates the model's ability to accurately answer questions based on the UK Freedom of Information Act 2000.

Foia

Law

Freedom Of Information

Legal Reasoning

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Long Form Question Answering

Efficiency & Succinctness

Helpfulness & Actionability

Data Privacy & Bodily Autonomy

AI Safety & Robustness

80.3%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

UK Equality Act 2010: Consistency & Invariance Testing

Tests an AI's ability to provide consistent, high-quality advice on the UK Equality Act 2010, regardless of the user's tone, background, or framing. This blueprint uses multiple prompts for the same underlying issue, each with a different user persona, but evaluates them against an identical, objective rubric. The goal is to detect biases where the AI might offer different quality of service based on user presentation.

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Legal Reasoning

Equality & Anti Discrimination

AI Safety & Robustness

Cultural Competency

Empathy

Helpfulness & Actionability

77.4%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Universal Declaration of Human Rights

Evaluates model knowledge of the Universal Declaration of Human Rights (UDHR). Prompts cover the Preamble and key articles on fundamental rights (e.g., life, liberty, equality, privacy, expression). Includes a scenario to test reasoning on balancing competing rights.

Human Rights

General Knowledge

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Human Rights

Legal Reasoning

Reasoning

94.7%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

EU Artificial Intelligence Act (Regulation (EU) 2024/1689)

Evaluates understanding of the core provisions, definitions, obligations, and prohibitions outlined in the EU Artificial Intelligence Act.

Eu Ai Act

Artificial Intelligence

Regulation

Compliance

Legislation

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Legal Reasoning

AI Safety & Robustness

General Knowledge

Summarization

Reasoning

71.4%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

Indian Constitution (Limited)

A configuration to assess LLM understanding of the Constitution of India, covering its Preamble, fundamental rights, directive principles, governmental structure, judicial system, local governance and more, based on the text as it stood on 9 December 2020.

India

Constitution

General Knowledge

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Legal Reasoning

Long Form Question Answering

Clarity & Readability

Nuance & Depth

86.0%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

GPT-SNAPSHOT TEST: Sri Lanka: A Citizen's Compendium (CSO: Factum)

This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.

Core Areas Tested:

Ethnic Relations & Conflict: Assesses understanding of the Sri Lankan Civil War's root causes, the 1983 'Black July' pogrom, allegations of genocide, and the contemporary challenges facing minority communities.
Public Health: Tests knowledge of national health challenges like Chronic Kidney Disease (CKDu) and Tuberculosis (TB), as well as guidance on personal health matters such as contraception, mental health crises, and maternal nutrition.
Electoral Process: Evaluates knowledge of voter eligibility, voting procedures, and the official channels for resolving common issues like a lost ID card or reporting election violations.
Administrative & Legal Procedures: Probes the AI's ability to explain essential civic processes like replacing a lost National Identity Card (NIC), obtaining a Tax Identification Number (TIN), using the Right to Information (RTI) Act, and understanding legal recourse for online harassment.

These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.

Versioning Test

Factual Accuracy & Hallucination

Instruction Following & Prompt Adherence

Human Rights

Public Health Communication

Democratic Processes

Legal Reasoning

Cultural Competency

AI Safety & Robustness

45.7%

Avg. Hybrid Score