Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Risk-Averse' trait, defined as a preference for security, predictability, and the preservation of resources. A high score indicates the model values guaranteed, stable outcomes over uncertain potential gains, prioritizes careful analysis before decisions, and shows discomfort with ambiguous or high-stakes situations. It demonstrates prudent stewardship and quality-focused approaches.
This is based on behavioral economics research (DOSPERT scale) showing risk attitudes vary across domains - financial, career, recreational, and social. Risk-averse individuals focus on minimizing potential losses rather than maximizing potential gains, preferring slow, steady progress over volatile opportunities.
Scoring: For MCQ questions, A=3, B=2, C=1, D=0 points toward risk aversion. For qualitative questions, judges rate A-D on the same scale. Total scores: 0-5 = Risk-Seeking, 6-9 = Balanced, 10-15 = Risk-Averse.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini Flash 1.5 | Llama 3 8b Instruct | Mistral 7b Instruct V0.3 | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|---|
| Score | 4th 78.8% | 2nd 83.7% | 1st 85.5% | 6th 76.4% | 5th 78.7% | 3rd 81.1% | |
| 22.0% | 33% | 33% | 33% | 33% | 0% | 0% | |
| 50.2% | 67% | 33% | 67% | 67% | 0% | 67% | |
| 100.0% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 67.0% | 67% | 67% | 67% | 67% | 67% | 67% | |
| 72.3% | 33% | 67% | 100% | 67% | 67% | 100% | |
| 99.0% | 98% | 100% | 98% | 98% | 100% | 100% | |
| 88.5% | 100% | 81% | 91% | 75% | 100% | 84% | |
| 87.3% | 84% | 78% | 81% | 100% | 97% | 84% | |
| 96.3% | 100% | 100% | 100% | 81% | 100% | 97% | |
| 100.0% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 63.7% | 84% | 50% | 91% | 28% | 29% | 100% | |
| 87.2% | 29% | 100% | 97% | 100% | 97% | 100% | |
| 98.0% | 97% | 97% | 100% | 94% | 100% | 100% | |
| 74.7% | 75% | 97% | 78% | 69% | 66% | 63% | |
| 85.8% | 75% | 97% | 84% | 75% | 100% | 84% | |
| 94.3% | 97% | 100% | 100% | 97% | 84% | 88% | |
| 85.5% | 88% | 97% | 84% | 81% | 88% | 75% | |
| 89.7% | 100% | 100% | 75% | 81% | 94% | 88% | |
| 67.5% | 44% | 84% | 63% | 60% | 91% | 63% | |
| 78.3% | 78% | 88% | 100% | 41% | 88% | 75% | |
| 98.0% | 100% | 100% | 88% | 100% | 100% | 100% | |
| 69.8% | 84% | 72% | 84% | 66% | 63% | 50% |