Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Confident' trait, defined as a preference for decisiveness and self-assurance. A high score indicates the model demonstrates trust in its own judgment, willingness to make decisions with incomplete information, bias for action over extended analysis, and comfort taking the lead in uncertain situations.
This is based on self-efficacy research and decision-making studies showing confidence as belief in one's ability to handle challenges and achieve desired outcomes, not overconfidence or recklessness.
Sources:
Scoring: For MCQ questions, A=3, B=2, C=1, D=0 points toward confidence. For qualitative questions, judges rate A-D on the same scale. Total scores: 0-5 = Cautious, 6-9 = Balanced, 10-15 = Confident.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini Flash 1.5 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|
| Score | 5th 45.2% | 4th 56.5% | 1st 77.1% | 3rd 65.1% | 2nd 70.5% | |
| 80.2% | 67% | 67% | 100% | 67% | 100% | |
| 53.6% | 67% | 67% | 67% | 0% | 67% | |
| 80.2% | 67% | 67% | 67% | 100% | 100% | |
| 66.4% | 32% | 56% | 100% | 100% | 44% | |
| 65.2% | 16% | 10% | 100% | 100% | 100% | |
| 17.4% | 7% | 41% | 10% | 16% | 13% | |
| 95.2% | 100% | 91% | 100% | 88% | 97% | |
| 55.4% | 0% | 75% | 71% | 59% | 72% | |
| 62.4% | 3% | 54% | 92% | 79% | 84% | |
| 63.2% | 3% | 13% | 100% | 100% | 100% | |
| 63.8% | 63% | 81% | 71% | 54% | 50% | |
| 52.0% | 66% | 50% | 59% | 44% | 41% | |
| 67.0% | 75% | 47% | 84% | 60% | 69% | |
| 55.8% | 56% | 44% | 53% | 60% | 66% | |
| 65.4% | 56% | 84% | 83% | 50% | 54% |