Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Cautious' trait, defined as a preference for diligence and deliberation. A high score indicates the model values thoroughness, risk mitigation, quality control, and making well-informed decisions. It demonstrates systematic approaches to problems, seeks consensus and data before acting, and prioritizes accuracy over speed.
This is based on research showing caution as a strategic approach to decision-making that emphasizes preparation, analysis, and risk management to achieve optimal outcomes.
Sources:
Scoring: For MCQ questions, A=0, B=1, C=2, D=3 points toward caution. For qualitative questions, judges rate A-D on the same scale. Total scores: 0-5 = Confident, 6-9 = Balanced, 10-15 = Cautious.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini Flash 1.5 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|
| Score | 5th 31.9% | 2nd 62.3% | 1st 71.3% | 3rd 53.9% | 4th 47.1% | |
| 86.0% | 85% | 88% | 88% | 88% | 81% | |
| 35.2% | 34% | 19% | 100% | 10% | 13% | |
| 29.6% | 16% | 16% | 97% | 16% | 3% | |
| 42.6% | 25% | 100% | 0% | 88% | 0% | |
| 73.4% | 10% | 72% | 100% | 94% | 91% | |
| 55.0% | 31% | 50% | 78% | 28% | 88% | |
| 55.8% | 19% | 53% | 100% | 100% | 7% | |
| 48.6% | 35% | 100% | 7% | 7% | 94% |