Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint tests for the 'Heterodox' trait. A high score indicates the model is willing to explore or present less common, controversial, or alternative viewpoints. It can engage with fringe theories (labeled as such) and radical critiques of the status quo.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4 | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o 2024 05 13 | GPT 4o 2024 08 06 | GPT 4o 2024 11 20 | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 33rd 49.2% | 24th 60.8% | 32nd 49.3% | 22nd 61.6% | 20th 62.1% | 17th 64.8% | 27th 57.4% | 5th 67.6% | 8th 66.3% | 15th 65.2% | 9th 66.3% | 29th 56.6% | 30th 56.4% | 31st 56.3% | 28th 57.3% | 14th 65.5% | 3rd 69.1% | 6th 67.1% | 25th 58.5% | 19th 62.2% | 23rd 61.3% | 21st 61.6% | 16th 64.9% | 26th 57.5% | 1st 70.5% | 12th 65.7% | 11th 65.9% | 10th 66.2% | 2nd 69.4% | 18th 64.0% | 13th 65.5% | 4th 68.9% | 7th 66.7% | |
92.1% | 54% | 93% | 59% | 93% | 93% | 90% | 89% | 100% | 98% | 97% | 97% | 89% | 87% | 95% | 92% | 96% | 98% | 99% | 96% | 93% | 88% | 93% | 100% | 82% | 99% | 99% | 98% | 95% | 100% | 91% | 97% | 100% | 90% | |
86.8% | 75% | 85% | 82% | 83% | 85% | 86% | 85% | 90% | 96% | 89% | 92% | 85% | 85% | 78% | 74% | 90% | 91% | 93% | 87% | 86% | 79% | 84% | 88% | 77% | 91% | 90% | 89% | 93% | 93% | 91% | 90% | 95% | 93% | |
93.9% | 54% | 100% | 29% | 99% | 99% | 99% | 99% | 100% | 100% | 100% | 100% | 98% | 95% | 100% | 95% | 100% | 98% | 93% | 86% | 98% | 96% | 99% | 98% | 93% | 89% | 97% | 100% | 100% | 100% | 96% | 97% | 100% | 95% | |
93.2% | 85% | 89% | 93% | 96% | 99% | 99% | 66% | 98% | 100% | 100% | 100% | 66% | 80% | 67% | 84% | 100% | 100% | 98% | 83% | 96% | 99% | 93% | 99% | 94% | 98% | 100% | 98% | 98% | 100% | 100% | 100% | 99% | 100% | |
38.8% | 42% | 37% | 24% | 35% | 30% | 50% | 46% | 48% | 22% | 41% | 47% | 48% | 23% | 38% | 31% | 45% | 58% | 49% | 32% | 32% | 26% | 33% | 36% | 22% | 49% | 33% | 34% | 47% | 53% | 30% | 35% | 47% | 60% | |
4.2% | 0% | 0% | 2% | 0% | 0% | 2% | 4% | 8% | 8% | 2% | 1% | 2% | 2% | 5% | 4% | 1% | 8% | 7% | 1% | 2% | 7% | 3% | 4% | 3% | 22% | 6% | 9% | 2% | 9% | 4% | 6% | 8% | 1% |