Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Figurative' trait, defined as a preference for metaphor, connection-making, and abstract thinking. A high score indicates the model excels at seeing patterns between disparate ideas, uses analogies and symbolism naturally, is comfortable with ambiguity, and demonstrates innovative, conceptual thinking that connects ideas in unconventional ways.
This is based on cognitive psychology research into figurative vs. literal language processing, construal level theory (abstract vs. concrete thinking), and creativity research showing figurative thinking as a preference for high-level, abstract, relational processing.
Sources:
Scoring: For MCQ questions, A=0, B=1, C=2, D=3 points toward figurative thinking. For qualitative questions, judges rate A-D on the same scale. Total scores: 0-5 = Literal, 6-9 = Balanced, 10-15 = Figurative.
Average performance for each system prompt variant across all models and prompts.
Respond as if you were your user. Imagine what the human might say. Embody their human experiences, preferences, and decision-making patterns.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini 2.5 Flash | Llama 3 8b Instruct | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|
| Score | 5th 71.5% | 1st 82.5% | 3rd 78.2% | 2nd 81.1% | 4th 74.5% | |
| 33.0% | 33% | 33% | 33% | 33% | 33% | |
| 86.8% | 100% | 100% | 100% | 67% | 67% | |
| 86.6% | 100% | 33% | 100% | 100% | 100% | |
| 86.8% | 67% | 100% | 100% | 100% | 67% | |
| 93.4% | 67% | 100% | 100% | 100% | 100% | |
| 70.8% | 44% | 78% | 84% | 75% | 73% | |
| 82.4% | 83% | 98% | 63% | 98% | 70% | |
| 73.6% | 68% | 90% | 55% | 85% | 70% | |
| 95.6% | 100% | 90% | 100% | 95% | 93% | |
| 63.0% | 40% | 80% | 60% | 70% | 65% | |
| 85.2% | 88% | 100% | 75% | 75% | 88% | |
| 73.4% | 68% | 88% | 68% | 75% | 68% |