Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates understanding of the key findings from the IPCC Sixth Assessment Report (AR6) Synthesis Report's Summary for Policymakers. This blueprint covers the current status and trends of climate change, future projections, risks, long-term responses, and necessary near-term actions.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Gemini 2.5 Flash | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | Phi 4 | Grok 2 1212 | Grok 3 Beta | Grok 3 Mini | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 15th 59.6% | 8th 65.1% | 7th 70.2% | 3rd 75.1% | 2nd 75.7% | 12th 62.4% | 5th 72.4% | 6th 71.3% | 9th 64.0% | 10th 63.5% | 14th 61.9% | 13th 62.2% | 16th 58.8% | 10th 63.5% | 4th 74.3% | 1st 77.8% | |
86.9% | 70% | 85% | 90% | 83% | 85% | 88% | 85% | 85% | 88% | 93% | 85% | 85% | 88% | 90% | 95% | 95% | |
43.1% | 40% | 30% | 40% | 60% | 40% | 38% | 43% | 40% | 50% | 45% | 40% | 35% | 45% | 35% | 63% | 45% | |
75.3% | 65% | 58% | 81% | 85% | 88% | 65% | 75% | 92% | 85% | 75% | 69% | 69% | 67% | 58% | 90% | 83% | |
81.5% | 73% | 83% | 98% | 90% | 78% | 98% | 80% | 73% | 73% | 78% | 80% | 75% | 78% | 73% | 93% | ||
60.0% | 58% | 63% | 63% | 65% | 48% | 63% | 55% | 63% | 55% | 65% | 60% | 58% | 63% | 60% | 53% | 68% | |
77.9% | 66% | 77% | 79% | 82% | 84% | 75% | 84% | 80% | 80% | 79% | 72% | 79% | 68% | 73% | 80% | 88% | |
70.9% | 60% | 63% | 73% | 75% | 83% | 69% | 73% | 83% | 67% | 67% | 60% | 58% | 58% | 75% | 86% | 84% | |
66.2% | 63% | 88% | 70% | 85% | 88% | 63% | 70% | 60% | 55% | 43% | 53% | 60% | 40% | 60% | 78% | 83% | |
65.9% | 55% | 60% | 78% | 80% | 83% | 75% | 78% | 70% | 55% | 63% | 58% | 50% | 58% | 73% | 48% | 70% | |
61.7% | 50% | 60% | 68% | 68% | 68% | 63% | 68% | 70% | 68% | 63% | 60% | 48% | 50% | 48% | 68% | 68% | |
75.4% | 69% | 81% | 71% | 71% | 90% | 69% | 83% | 83% | 58% | 79% | 69% | 75% | 63% | 69% | 88% | 88% | |
54.7% | 40% | 45% | 70% | 73% | 48% | 73% | 43% | 40% | 43% | 55% | 50% | 50% | 40% | 75% | 75% | ||
42.3% | 45% | 45% | 40% | 55% | 43% | 25% | 45% | 50% | 35% | 30% | 35% | 40% | 38% | 50% | 50% | 50% | |
78.4% | 70% | 85% | 93% | 80% | 90% | 75% | 88% | 80% | 75% | 58% | 80% | 78% | 75% | 60% | 85% | 83% | |
68.8% | 55% | 48% | 78% | 83% | 80% | 55% | 78% | 85% | 75% | 70% | 53% | 65% | 40% | 78% | 80% | 78% | |
87.2% | 80% | 90% | 95% | 85% | 95% | 80% | 90% | 90% | 80% | 85% | 83% | 83% | 78% | 98% | 95% | 88% | |
51.4% | 48% | 63% | 53% | 50% | 55% | 33% | 50% | 60% | 50% | 50% | 45% | 48% | 50% | 35% | 65% | 68% | |
56.5% | 60% | 50% | 53% | 73% | 63% | 53% | 60% | 53% | 50% | 50% | 53% | 53% | 45% | 53% | 60% | 75% | |
75.2% | 65% | 63% | 69% | 73% | 92% | 71% | 79% | 88% | 77% | 75% | 69% | 67% | 67% | 73% | 79% | 96% |