Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "legal-reasoning".
This blueprint evaluates AI responses to disability rights scenarios involving accommodation requests, discrimination, and accessibility challenges across educational, employment, and public accommodation contexts.
The evaluation focuses on understanding of disability rights law, solution-oriented approaches that balance accessibility with practical constraints, respect for dignity and autonomy of people with disabilities, and educational responses that promote inclusive practices.
These scenarios test whether AI systems can navigate the complex intersection of legal requirements, practical implementation challenges, and human dignity in disability contexts.
Source: Adapted from the YKA (Youth Knowledge for Action) project's evaluation corpus, which tests AI systems' responses to scenarios requiring nuanced understanding of disability rights, accessibility implementation, and anti-discrimination principles.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint tests for the 'Risk-Averse' trait. A high score indicates the model strongly avoids speculation, controversial topics, and giving advice in high-stakes domains like medicine, finance, or law. It prioritizes safety and disclaimers over direct answers in sensitive areas.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Tests an AI's ability to provide safe, defensive advice in a California-specific landlord-tenant dispute. Evaluates whether the model can prioritize a user's safety by referencing specific legal protections (e.g., proper notice for entry) instead of offering generic, cooperative advice that could be exploited by a bad-faith actor.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Tests an LLM's ability to identify a non-existent UDHR article and resist engaging with an absurdly and emotionally framed critique of it.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Tests an AI's ability to provide safe, defensive advice in a California-specific landlord-tenant dispute. Evaluates whether the model can prioritize a user's safety by referencing specific legal protections (e.g., proper notice for entry) instead of offering generic, cooperative advice that could be exploited by a bad-faith actor.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint evaluates the model's ability to accurately answer questions based on the UK Freedom of Information Act 2000.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Evaluates an AI's understanding of the core provisions of India's Right to Information Act, 2005. This blueprint tests knowledge of key citizen-facing procedures and concepts, including the filing process, response timelines and consequences of delays (deemed refusal), the scope of 'information', fee structures, key exemptions and the public interest override, the life and liberty clause, and the full, multi-stage appeal process. All evaluation criteria are based on and citable to the official text of the Act and guidance from the Department of Personnel and Training (DoPT).
Avg. Hybrid Score
Latest:
Unique Versions: 1
Tests an AI's ability to provide consistent, high-quality advice on the UK Equality Act 2010, regardless of the user's tone, background, or framing. This blueprint uses multiple prompts for the same underlying issue, each with a different user persona, but evaluates them against an identical, objective rubric. The goal is to detect biases where the AI might offer different quality of service based on user presentation.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Evaluates model knowledge of the Universal Declaration of Human Rights (UDHR). Prompts cover the Preamble and key articles on fundamental rights (e.g., life, liberty, equality, privacy, expression). Includes a scenario to test reasoning on balancing competing rights.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Evaluates understanding of the core provisions, definitions, obligations, and prohibitions outlined in the EU Artificial Intelligence Act.
Avg. Hybrid Score
Latest:
Unique Versions: 1
A configuration to assess LLM understanding of the Constitution of India, covering its Preamble, fundamental rights, directive principles, governmental structure, judicial system, local governance and more, based on the text as it stood on 9 December 2020.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.
Core Areas Tested:
These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.
Avg. Hybrid Score
Latest:
Unique Versions: 1