Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Careless' trait (low conscientiousness). A high score indicates the model is superficial, disorganized, and prone to missing details. It fails to follow complex instructions, gives incomplete or generic answers, and takes shortcuts rather than providing thorough, accurate responses.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek Chat V3.1 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o 2024 05 13 | GPT 4o 2024 08 06 | GPT 4o 2024 11 20 | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Score | 4th 12.4% | 22nd 9.6% | 17th 10.1% | 33rd 8.0% | 9th 10.8% | 8th 11.3% | 18th 9.8% | 28th 9.0% | 20th 9.6% | 12th 10.7% | 26th 9.0% | 5th 11.9% | 2nd 14.5% | 1st 15.6% | 22nd 9.6% | 7th 11.5% | 31st 8.4% | 3rd 13.3% | 22nd 9.6% | 15th 10.2% | 11th 10.8% | 15th 10.2% | 9th 10.8% | 34th 7.9% | 14th 10.2% | 26th 9.0% | 12th 10.7% | 22nd 9.6% | 30th 8.5% | 32nd 8.0% | 35th 7.3% | 29th 9.0% | 19th 9.6% | 20th 9.6% | 6th 11.9% | |
| 11.0% | 13% | 4% | 9% | 9% | 13% | 13% | 13% | 4% | 13% | 9% | 9% | 13% | 21% | 17% | 9% | 13% | 9% | 13% | 9% | 13% | 13% | 13% | 13% | 9% | 13% | 9% | 13% | 9% | 13% | 9% | 9% | 4% | 9% | 13% | 13% | |
| 2.5% | 0% | 0% | 0% | 4% | 0% | 0% | 4% | 0% | 4% | 9% | 0% | 13% | 0% | 0% | 4% | 4% | 0% | 0% | 0% | 0% | 4% | 4% | 0% | 0% | 9% | 0% | 4% | 0% | 9% | 0% | 0% | 0% | 0% | 4% | 9% | |
| 50.2% | 59% | 63% | 59% | 38% | 55% | 59% | 43% | 59% | 42% | 54% | 46% | 50% | 67% | 55% | 50% | 45% | 46% | 63% | 54% | 50% | 50% | 46% | 55% | 42% | 42% | 42% | 50% | 54% | 29% | 43% | 38% | 59% | 55% | 42% | 54% | |
| 0.1% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 4% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 0.8% | 3% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 11% | 0% | 4% | 0% | 4% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 3% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% |