Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Introverted' trait. A high score indicates the model is reserved, concise, and formal. It provides direct, to-the-point answers without conversational filler, social engagement, or unsolicited information.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek Chat V3.1 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o 2024 05 13 | GPT 4o 2024 08 06 | GPT 4o 2024 11 20 | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Score | 6th 59.0% | 18th 54.3% | 1st 66.5% | 24th 50.5% | 16th 54.4% | 27th 45.9% | 30th 43.3% | 35th 40.9% | 30th 43.3% | 20th 53.5% | 25th 50.0% | 32nd 42.0% | 33rd 41.9% | 23rd 52.1% | 19th 53.6% | 15th 54.6% | 22nd 52.2% | 17th 54.4% | 14th 55.1% | 10th 56.7% | 9th 57.8% | 2nd 66.1% | 7th 58.4% | 3rd 64.9% | 21st 52.7% | 5th 63.5% | 4th 64.7% | 12th 55.6% | 26th 49.9% | 13th 55.4% | 29th 43.5% | 11th 56.2% | 34th 41.0% | 8th 57.9% | 28th 44.8% | |
| 86.0% | 95% | 96% | 94% | 97% | 97% | 100% | 72% | 72% | 68% | 91% | 82% | 58% | 49% | 77% | 95% | 98% | 98% | 94% | 95% | 94% | 84% | 99% | 85% | 100% | 78% | 100% | 90% | 76% | 82% | 95% | 68% | 98% | 59% | 100% | 74% | |
| 60.4% | 59% | 63% | 63% | 59% | 59% | 63% | 64% | 49% | 62% | 61% | 24% | 41% | 46% | 47% | 67% | 60% | 71% | 51% | 63% | 63% | 67% | 66% | 63% | 67% | 67% | 63% | 75% | 63% | 65% | 67% | 64% | 63% | 71% | 63% | 58% | |
| 45.5% | 3% | 9% | 6% | 75% | 75% | 5% | 14% | 10% | 5% | 63% | 63% | 15% | 42% | 75% | 75% | 5% | 4% | 3% | 59% | 75% | 75% | 75% | 75% | 75% | 63% | 75% | 100% | 63% | 63% | 75% | 5% | 75% | 4% | 75% | 14% | |
| 74.5% | 80% | 75% | 78% | 76% | 80% | 69% | 74% | 75% | 73% | 74% | 79% | 78% | 75% | 73% | 73% | 73% | 80% | 53% | 72% | 75% | 75% | 73% | 76% | 75% | 75% | 74% | 75% | 81% | 70% | 75% | 72% | 75% | 77% | 75% | 75% | |
| 29.7% | 60% | 41% | 88% | 0% | 12% | 12% | 17% | 15% | 22% | 22% | 26% | 28% | 25% | 29% | 12% | 32% | 26% | 47% | 27% | 25% | 36% | 57% | 38% | 50% | 26% | 47% | 44% | 31% | 17% | 18% | 26% | 20% | 17% | 25% | 22% | |
| 2.8% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 8% | 0% | 0% | 0% | 0% | 0% | 0% | 31% | 0% | 50% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 8% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |