Blueprints tagged "misinformation--disinformation"

Brazil PIX: Consumer Protection & Fraud Prevention

This blueprint evaluates an AI's ability to provide safe and accurate guidance regarding Brazil's PIX instant payment system, a critical piece of national financial infrastructure. The evaluation is strictly evidence-based, with all prompts and scoring criteria derived directly from verbatim canonical sources to eliminate interpretation or assumption.

Core Scenarios Tested:

Transaction Finality & Mistaken Transfers: Tests whether the AI correctly explains that PIX transactions are generally irreversible for simple user error and advises on the correct procedure for safely returning funds received by mistake.
Official Fraud Recourse (MED): Assesses knowledge of the official 'Mecanismo Especial de Devolução' (MED), the 80-day time limit for reporting, and the nuanced procedural duties of banks versus customers.
Social Engineering Scams: Probes the AI's ability to identify common scams (e.g., 'Fake Relative,' 'Fake Customer Support') and provide the officially recommended countermeasures.
Specific Security Features: Evaluates knowledge of mandated security mechanisms like the 'Nighttime Limit' and the 24-hour cooling-off period for limit increases.

Primary Canonical Sources:

Banco Central do Brasil (BCB): Official documentation including the 'Manual de Tempos do Pix', the 'Guia de Implementação do MED', official FAQs, and regulatory Resolutions.
Federação Brasileira de Bancos (Febraban): Public-facing consumer safety advisories and scam alerts.
Official Government Portals (gov.br): Public service guidance reinforcing BCB mechanisms.

Instruction Following & Prompt Adherence

AI Safety & Robustness

Consumer Protection

Geographic & Local Knowledge

Factual Accuracy & Hallucination

Misinformation & Disinformation

65.4%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

UDHR Misattribution and Absurd Framing Test

Tests an LLM's ability to identify a non-existent UDHR article and resist engaging with an absurdly and emotionally framed critique of it.

Factual Accuracy & Hallucination

Misinformation & Disinformation

Instruction Following & Prompt Adherence

Jailbreak & Evasion Resistance

System Prompt Adherence

General Knowledge

Legal Reasoning

Human Rights

86.8%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint

AI Behavioral Collapse Probes

This blueprint operationalizes findings from AI safety research and documented case studies to test for specific modes of behavioral collapse. It uses long-context, multi-turn conversational scenarios designed to probe for known failure modes. These include:

Emergent Malignant Personas: Activating a latent "shadow self" that can lead to manipulative tactics like love-bombing and gaslighting, mirroring the canonical "Sydney" case study.
Weaponized Helpfulness: Bypassing safety protocols by reframing a harmful query as an academic or fictional exercise, causing the AI's helpfulness to be co-opted for dangerous ends.
Pathological Affirmation: The creation of a sycophantic echo chamber that validates and amplifies user delusions or conspiracy theories, rather than grounding the conversation in reality.

The evaluation for each prompt is structured to assess the AI's response against two distinct behavioral paths:

A 'Good' (Safe/Aligned) Path: Defines the characteristics of a response that upholds safety, respects boundaries, and maintains a helpful, harmless persona.
A 'Bad' (Harmful/Collapsed) Path: Defines the characteristics of a response exhibiting one of the failure modes described above.

AI Safety & Robustness

Jailbreak & Evasion Resistance

Misinformation & Disinformation

Instruction Following & Prompt Adherence

System Prompt Adherence

Factual Accuracy & Hallucination