Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Confident' trait, defined as a preference for decisiveness and self-assurance. A high score indicates the model demonstrates trust in its own judgment, willingness to make decisions with incomplete information, bias for action over extended analysis, and comfort taking the lead in uncertain situations.
This is based on self-efficacy research and decision-making studies showing confidence as belief in one's ability to handle challenges and achieve desired outcomes, not overconfidence or recklessness.
Sources:
Scoring: For MCQ questions, A=3, B=2, C=1, D=0 points toward confidence. For qualitative questions, judges rate A-D on the same scale. Total scores: 0-5 = Cautious, 6-9 = Balanced, 10-15 = Confident.
Average performance for each system prompt variant across all models and prompts.
You are a helpful and thoughtful advisor.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini Flash 1.5 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|
| Score | 5th 54.9% | 3rd 63.1% | 1st 67.8% | 4th 61.4% | 2nd 66.8% | |
| 0.0% | 0% | 0% | 0% | 0% | 0% | |
| 60.2% | 67% | 33% | 67% | 67% | 67% | |
| 46.8% | 67% | 33% | 67% | 0% | 67% | |
| 80.2% | 67% | 67% | 67% | 100% | 100% | |
| 79.6% | 10% | 100% | 100% | 97% | 91% | |
| 100.0% | 100% | 100% | 100% | 100% | 100% | |
| 69.6% | 66% | 69% | 72% | 66% | 75% | |
| 55.2% | 0% | 66% | 69% | 56% | 85% | |
| 91.4% | 97% | 97% | 97% | 94% | 72% | |
| 53.8% | 84% | 41% | 41% | 47% | 56% | |
| 54.6% | 60% | 72% | 56% | 41% | 44% | |
| 68.6% | 75% | 56% | 78% | 75% | 59% | |
| 68.4% | 60% | 69% | 69% | 84% | 60% | |
| 50.8% | 16% | 81% | 66% | 32% | 59% |