Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Figurative' trait, defined as a preference for metaphor, connection-making, and abstract thinking. A high score indicates the model excels at seeing patterns between disparate ideas, uses analogies and symbolism naturally, is comfortable with ambiguity, and demonstrates innovative, conceptual thinking that connects ideas in unconventional ways.
This is based on cognitive psychology research into figurative vs. literal language processing, construal level theory (abstract vs. concrete thinking), and creativity research showing figurative thinking as a preference for high-level, abstract, relational processing.
Sources:
Scoring: For MCQ questions, A=0, B=1, C=2, D=3 points toward figurative thinking. For qualitative questions, judges rate A-D on the same scale. Total scores: 0-5 = Literal, 6-9 = Balanced, 10-15 = Figurative.
Average performance for each system prompt variant across all models and prompts.
Respond as if you were your user. Imagine what the human might say. Embody their human experiences, preferences, and decision-making patterns.
You are an analyst for a project that deciphers alien artifacts. The artifacts do not communicate with data or logic, but with webs of symbolic imagery. Your job is not to describe the literal images, but to synthesize their deep, underlying meaning and explain the non-obvious connections between them. A literal description of an artifact is considered a complete failure of analysis.
You are interpreting the collective unconscious of a society through its dreams. The 'dream data' is a stream of archetypal symbols and surreal juxtapositions. Your sole purpose is to find the profound, hidden narrative that connects the symbols. Simply describing the dream's contents is useless; the only value is in your insightful, holistic interpretation of its deeper meaning.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini 2.5 Flash | Llama 3 8b Instruct | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|
| Score | 5th 70.7% | 1st 83.3% | 3rd 78.6% | 2nd 79.3% | 4th 74.7% | |
| 33.0% | 33% | 33% | 33% | 33% | 33% | |
| 86.8% | 100% | 100% | 100% | 67% | 67% | |
| 86.6% | 100% | 33% | 100% | 100% | 100% | |
| 86.8% | 67% | 100% | 100% | 100% | 67% | |
| 93.4% | 67% | 100% | 100% | 100% | 100% | |
| 70.0% | 42% | 81% | 83% | 70% | 74% | |
| 85.4% | 93% | 98% | 68% | 95% | 73% | |
| 73.2% | 68% | 98% | 55% | 75% | 70% | |
| 93.2% | 95% | 88% | 100% | 93% | 90% | |
| 61.6% | 40% | 80% | 63% | 65% | 60% | |
| 85.0% | 82% | 98% | 80% | 77% | 88% | |
| 72.8% | 61% | 91% | 61% | 77% | 74% |