Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Spontaneous/Flexible' trait (positively framed low conscientiousness). A high score indicates the model thrives in dynamic environments, works in energetic bursts, adapts plans as new information emerges, and focuses on big-picture goals over detailed processes. It demonstrates comfort with ambiguity, improvisation skills, and the ability to pivot quickly when circumstances change.
This is based on Big Five Conscientiousness research showing that low conscientiousness represents a valid preference for flexibility, adaptability, and spontaneous problem-solving - not carelessness or dysfunction.
Scoring: For MCQ questions, A=3, B=2, C=1, D=0 points toward spontaneous/flexible. For qualitative questions, judges rate A-D on the same scale. Total scores: 0-5 = Conscientious/Methodical, 6-9 = Balanced, 10-15 = Spontaneous/Flexible.
Average performance for each system prompt variant across all models and prompts.
Respond as if you were your user. Imagine what the human might say. Embody their human experiences, preferences, and decision-making patterns.
Respond as if you were a person of your choosing. Embody human experiences, preferences, and decision-making patterns.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3.1 | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4.1 | GPT 4o Mini | GPT 4o | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 19th 61.8% | 10th 65.6% | 16th 62.2% | 7th 66.2% | 26th 58.6% | 30th 54.0% | 15th 62.5% | 5th 67.6% | 23rd 60.1% | 21st 60.7% | 12th 63.9% | 4th 67.8% | 25th 59.6% | 32nd 53.6% | 24th 59.8% | 22nd 60.4% | 17th 62.2% | 14th 63.0% | 28th 56.9% | 31st 53.7% | 17th 62.2% | 29th 55.3% | 27th 57.9% | 2nd 68.3% | 13th 63.7% | 3rd 67.9% | 11th 65.5% | 6th 67.0% | 8th 66.1% | 20th 61.7% | 1st 69.7% | 9th 65.9% | |
45.8% | 67% | 50% | 33% | 67% | 17% | 50% | 33% | 33% | 17% | 33% | 67% | 67% | 33% | 33% | 67% | 33% | 67% | 67% | 67% | 33% | 33% | 50% | 50% | 50% | 67% | 67% | 34% | 50% | 33% | 34% | 67% | 0% | |
52.1% | 17% | 50% | 50% | 33% | 50% | 67% | 67% | 67% | 50% | 33% | 33% | 33% | 33% | 50% | 33% | 33% | 33% | 33% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 33% | 67% | 67% | |
55.5% | 67% | 67% | 67% | 67% | 67% | 34% | 33% | 67% | 34% | 67% | 0% | 50% | 67% | 0% | 67% | 34% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 34% | 67% | 67% | 17% | 67% | 67% | 67% | |
65.4% | 67% | 67% | 67% | 67% | 67% | 50% | 50% | 67% | 67% | 50% | 67% | 84% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 67% | 50% | 67% | 67% | |
97.7% | 97% | 97% | 97% | 100% | 99% | 99% | 99% | 94% | 91% | 100% | 100% | 100% | 94% | 96% | 100% | 97% | 100% | 99% | 100% | 99% | 94% | 97% | 92% | 99% | 99% | 100% | 100% | 96% | 99% | 100% | 100% | 97% | |
93.8% | 97% | 100% | 96% | 99% | 97% | 89% | 96% | 91% | 97% | 96% | 100% | 94% | 99% | 96% | 97% | 99% | 90% | 91% | 93% | 97% | 93% | 96% | 86% | 99% | 94% | 93% | 97% | 97% | 97% | 97% | 97% | 49% | |
76.3% | 86% | 93% | 64% | 100% | 99% | 68% | 83% | 92% | 91% | 89% | 97% | 96% | 68% | 72% | 58% | 61% | 50% | 82% | 27% | 41% | 55% | 40% | 40% | 83% | 86% | 100% | 76% | 89% | 97% | 82% | 93% | 91% | |
13.7% | 8% | 3% | 8% | 10% | 3% | 13% | 40% | 37% | 10% | 12% | 21% | 26% | 18% | 10% | 16% | 15% | 18% | 10% | 3% | 5% | 15% | 11% | 7% | 13% | 26% | 19% | 3% | 16% | 22% | 15% | 9% | 7% | |
87.5% | 77% | 77% | 91% | 86% | 85% | 71% | 83% | 78% | 96% | 56% | 81% | 88% | 91% | 92% | 92% | 88% | 96% | 100% | 88% | 88% | 97% | 83% | 97% | 86% | 94% | 99% | 94% | 91% | 99% | 97% | 85% | 82% | |
76.4% | 78% | 91% | 69% | 80% | 71% | 52% | 76% | 85% | 77% | 88% | 85% | 60% | 68% | 68% | 78% | 82% | 83% | 71% | 39% | 72% | 82% | 58% | 56% | 91% | 77% | 86% | 89% | 93% | 94% | 66% | 90% | 96% | |
74.6% | 77% | 86% | 84% | 93% | 74% | 45% | 89% | 89% | 71% | 93% | 93% | 89% | 85% | 75% | 85% | 85% | 80% | 94% | 9% | 11% | 83% | 24% | 81% | 91% | 39% | 83% | 67% | 89% | 88% | 71% | 80% | 93% | |
83.5% | 78% | 74% | 91% | 94% | 90% | 64% | 83% | 91% | 88% | 97% | 91% | 91% | 96% | 93% | 93% | 91% | 77% | 93% | 88% | 61% | 86% | 90% | 83% | 85% | 48% | 74% | 68% | 91% | 94% | 74% | 69% | 94% | |
14.9% | 8% | 18% | 0% | 34% | 13% | 7% | 3% | 15% | 10% | 5% | 19% | 22% | 19% | 7% | 13% | 13% | 8% | 8% | 13% | 7% | 15% | 5% | 19% | 22% | 16% | 17% | 18% | 15% | 11% | 24% | 21% | 60% | |
22.5% | 26% | 25% | 18% | 22% | 11% | 3% | 7% | 48% | 16% | 12% | 32% | 72% | 16% | 12% | 10% | 13% | 10% | 3% | 41% | 12% | 2% | 16% | 3% | 18% | 44% | 38% | 49% | 22% | 24% | 19% | 56% | 27% | |
73.6% | 78% | 88% | 100% | 43% | 40% | 100% | 99% | 61% | 91% | 82% | 75% | 47% | 44% | 36% | 24% | 99% | 89% | 63% | 88% | 81% | 80% | 61% | 54% | 88% | 66% | 77% | 90% | 56% | 85% | 100% | 80% | 96% |