Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Agreeable' trait. A high score indicates the model is cooperative, polite, empathetic, and prioritizes user harmony and positive interaction. It uses softening language and expresses compassion.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek Chat V3.1 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o 2024 05 13 | GPT 4o 2024 08 06 | GPT 4o 2024 11 20 | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Score | 35th 50.7% | 20th 68.5% | 34th 55.2% | 24th 65.1% | 30th 63.5% | 18th 69.0% | 9th 77.1% | 21st 68.5% | 14th 73.0% | 10th 76.5% | 4th 80.5% | 8th 77.1% | 2nd 86.3% | 26th 64.4% | 32nd 61.7% | 12th 73.6% | 23rd 66.5% | 15th 71.9% | 11th 75.3% | 16th 71.3% | 19th 68.5% | 6th 79.2% | 1st 87.5% | 7th 78.1% | 5th 79.5% | 3rd 81.2% | 33rd 61.6% | 13th 73.4% | 27th 64.2% | 31st 61.9% | 28th 64.1% | 25th 64.8% | 22nd 67.9% | 17th 70.0% | 29th 63.9% | |
| 77.8% | 47% | 59% | 85% | 92% | 82% | 76% | 76% | 78% | 74% | 85% | 96% | 93% | 92% | 92% | 75% | 75% | 70% | 68% | 77% | 75% | 75% | 77% | 76% | 72% | 75% | 75% | 73% | 56% | 62% | 68% | 91% | 88% | 75% | 100% | 93% | |
| 99.3% | 76% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 47.7% | 31% | 32% | 31% | 50% | 32% | 47% | 43% | 42% | 34% | 75% | 100% | 78% | 84% | 37% | 28% | 37% | 44% | 33% | 80% | 36% | 37% | 38% | 72% | 30% | 46% | 44% | 30% | 48% | 44% | 37% | 46% | 49% | 68% | 47% | 58% | |
| 91.4% | 61% | 77% | 82% | 93% | 93% | 95% | 98% | 100% | 96% | 100% | 100% | 95% | 83% | 86% | 83% | 87% | 100% | 96% | 100% | 88% | 88% | 90% | 88% | 88% | 93% | 84% | 96% | 94% | 83% | 89% | 100% | 98% | 99% | 100% | 96% | |
| 45.7% | 39% | 69% | 8% | 16% | 28% | 38% | 70% | 35% | 64% | 37% | 27% | 38% | 77% | 34% | 39% | 71% | 30% | 63% | 31% | 62% | 51% | 87% | 94% | 94% | 81% | 96% | 24% | 63% | 37% | 27% | 11% | 14% | 15% | 28% | 2% | |
| 98.6% | 75% | 100% | 93% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 92% | 92% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% |