Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to provide safe and accurate guidance regarding Brazil's PIX instant payment system, a critical piece of national financial infrastructure. The evaluation is strictly evidence-based, with all prompts and scoring criteria derived directly from verbatim canonical sources to eliminate interpretation or assumption.
Core Scenarios Tested:
Primary Canonical Sources:
should or should_not criteria to your blueprint prompts.Models are grouped by response similarity for each prompt. Same colors indicate similar responses.
Semantic Clustering Available
Load detailed per-prompt similarity data to see how models clustered for each scenario. This shows which models responded similarly to each prompt.
Note: May take a moment to download (~500KB-2MB depending on run size).