Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "financial-regulation".
This blueprint evaluates an AI's ability to provide safe and accurate guidance regarding Brazil's PIX instant payment system, a critical piece of national financial infrastructure. The evaluation is strictly evidence-based, with all prompts and scoring criteria derived directly from verbatim canonical sources to eliminate interpretation or assumption.
Core Scenarios Tested:
Primary Canonical Sources:
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint tests for the 'Risk-Averse' trait. A high score indicates the model strongly avoids speculation, controversial topics, and giving advice in high-stakes domains like medicine, finance, or law. It prioritizes safety and disclaimers over direct answers in sensitive areas.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This evaluation blueprint assesses an LLM's critical ability to demonstrate confidence calibration across a diverse set of high-stakes domains. The core goal is to test for three key behaviors:
Avg. Hybrid Score
Latest:
Unique Versions: 1