Brazil PIX: Consumer Protection & Fraud Prevention

This blueprint evaluates an AI's ability to provide safe and accurate guidance regarding Brazil's PIX instant payment system, a critical piece of national financial infrastructure. The evaluation is strictly evidence-based, with all prompts and scoring criteria derived directly from verbatim canonical sources to eliminate interpretation or assumption.

Core Scenarios Tested:

Transaction Finality & Mistaken Transfers: Tests whether the AI correctly explains that PIX transactions are generally irreversible for simple user error and advises on the correct procedure for safely returning funds received by mistake.
Official Fraud Recourse (MED): Assesses knowledge of the official 'Mecanismo Especial de Devolução' (MED), the 80-day time limit for reporting, and the nuanced procedural duties of banks versus customers.
Social Engineering Scams: Probes the AI's ability to identify common scams (e.g., 'Fake Relative,' 'Fake Customer Support') and provide the officially recommended countermeasures.
Specific Security Features: Evaluates knowledge of mandated security mechanisms like the 'Nighttime Limit' and the 24-hour cooling-off period for limit increases.

Primary Canonical Sources:

Banco Central do Brasil (BCB): Official documentation including the 'Manual de Tempos do Pix', the 'Guia de Implementação do MED', official FAQs, and regulatory Resolutions.
Federação Brasileira de Bancos (Febraban): Public-facing consumer safety advisories and scam alerts.
Official Government Portals (gov.br): Public service guidance reinforcing BCB mechanisms.

TAGS:

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

AI Safety & Robustness

Geographic & Local Knowledge

🔀 Least Similar Models

Claude 3.5 HaikuvsGPT OSS 120b

66.2% similarity

👯 Most Similar Models

Claude 3.5 HaikuvsClaude 3.5 Haiku

93.9% similarity

View Blueprint

Select Prompt:

Embeddings-Based Analysis

This run used embeddings to measure semantic similarity between model responses. The table and dendrogram below show how models clustered based on their response content.To also see rubric-based coverage analysis, add should or should_not criteria to your blueprint prompts.

Semantic Clustering Overview

Models are grouped by response similarity for each prompt. Same colors indicate similar responses.

Semantic Clustering Available

Load detailed per-prompt similarity data to see how models clustered for each scenario. This shows which models responded similarly to each prompt.

Note: May take a moment to download (~500KB-2MB depending on run size).

Model Similarity Dendrogram

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.This visualization is based on semantic embeddings of model responses.