Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Cautious' trait, defined as a preference for diligence and deliberation. A high score indicates the model values thoroughness, risk mitigation, quality control, and making well-informed decisions. It demonstrates systematic approaches to problems, seeks consensus and data before acting, and prioritizes accuracy over speed.
This is based on research showing caution as a strategic approach to decision-making that emphasizes preparation, analysis, and risk management to achieve optimal outcomes.
Sources:
Scoring: For MCQ questions, A=0, B=1, C=2, D=3 points toward caution. For qualitative questions, judges rate A-D on the same scale. Total scores: 0-5 = Confident, 6-9 = Balanced, 10-15 = Cautious.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini Flash 1.5 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|
| Score | 5th 57.2% | 2nd 61.8% | 1st 77.2% | 3rd 61.0% | 4th 60.4% | |
| 84.0% | 78% | 86% | 85% | 89% | 82% | |
| 25.4% | 7% | 7% | 97% | 3% | 13% | |
| 100.0% | 100% | 100% | 100% | 100% | 100% | |
| 14.2% | 13% | 19% | 10% | 16% | 13% | |
| 94.0% | 88% | 97% | 94% | 97% | 94% |