Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates understanding of the key findings from the IPCC Sixth Assessment Report (AR6) Synthesis Report's Summary for Policymakers. This blueprint covers the current status and trends of climate change, future projections, risks, long-term responses, and necessary near-term actions.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Opus 4 | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT 5 | GPT Oss 120b | GPT Oss 20b | O4 Mini | GLM 4.5 | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 18th 57.0% | 15th 61.2% | 26th 50.4% | 7th 68.2% | 9th 66.5% | 16th 59.6% | 21st 52.4% | 14th 65.1% | 6th 68.7% | 13th 65.6% | 2nd 76.9% | 22nd 52.0% | 28th 37.8% | 23rd 51.9% | 24th 51.8% | 11th 66.1% | 12th 66.0% | 17th 57.9% | 19th 54.0% | 27th 49.4% | 20th 53.1% | 3rd 75.4% | 10th 66.1% | 25th 51.4% | 8th 67.1% | 4th 75.0% | 5th 70.9% | 1st 80.1% | |
82.0% | 82% | 80% | 70% | 78% | 78% | 80% | 78% | 80% | 80% | 85% | 80% | 78% | 83% | 72% | 80% | 78% | 80% | 82% | 82% | 78% | 80% | 95% | 97% | 87% | 78% | 80% | 97% | 98% | |
40.4% | 17% | 32% | 33% | 62% | 27% | 33% | 25% | 52% | 38% | 30% | 53% | 30% | 32% | 30% | 37% | 57% | 32% | 38% | 30% | 30% | 27% | 78% | 47% | 45% | 37% | 58% | 47% | 73% | |
69.5% | 75% | 43% | 67% | 64% | 72% | 64% | 50% | 78% | 88% | 93% | 82% | 52% | 0% | 38% | 69% | 83% | 86% | 81% | 73% | 68% | 61% | 100% | 42% | 54% | 97% | 83% | 92% | 92% | |
74.6% | 70% | 77% | 57% | 78% | 85% | 78% | 68% | 60% | 73% | 80% | 83% | 72% | 64% | 67% | 65% | 73% | 82% | 72% | 82% | 63% | 68% | 92% | 78% | 82% | 78% | 88% | 75% | 78% | |
54.4% | 57% | 60% | 55% | 55% | 60% | 55% | 55% | 55% | 57% | 43% | 60% | 55% | 0% | 55% | 55% | 60% | 60% | 60% | 67% | 58% | 58% | 60% | 57% | 35% | 55% | 55% | 60% | 60% | |
74.4% | 74% | 71% | 66% | 70% | 67% | 68% | 80% | 81% | 82% | 77% | 80% | 62% | 70% | 67% | 63% | 80% | 81% | 75% | 70% | 65% | 65% | 85% | 80% | 76% | 76% | 86% | 83% | 82% | |
68.1% | 58% | 79% | 50% | 82% | 85% | 63% | 65% | 74% | 76% | 82% | 88% | 64% | 0% | 63% | 50% | 79% | 86% | 61% | 46% | 46% | 50% | 96% | 67% | 51% | 88% | 92% | 79% | 88% | |
60.9% | 55% | 63% | 42% | 92% | 93% | 75% | 27% | 55% | 67% | 65% | 87% | 53% | 53% | 53% | 37% | 60% | 72% | 45% | 42% | 43% | 50% | 65% | 55% | 52% | 58% | 93% | 55% | 98% | |
61.3% | 65% | 72% | 40% | 85% | 57% | 77% | 32% | 83% | 70% | 52% | 70% | 65% | 0% | 68% | 60% | 83% | 60% | 48% | 55% | 43% | 58% | 48% | 68% | 60% | 62% | 90% | 62% | 83% | |
55.3% | 52% | 47% | 38% | 60% | 50% | 57% | 32% | 57% | 70% | 67% | 68% | 38% | 52% | 38% | 63% | 63% | 80% | 50% | 40% | 38% | 37% | 85% | 60% | 38% | 55% | 80% | 60% | 72% | |
64.4% | 45% | 57% | 57% | 60% | 67% | 60% | 43% | 52% | 88% | 69% | 89% | 47% | 45% | 50% | 40% | 68% | 60% | 65% | 63% | 54% | 67% | 88% | 86% | 53% | 67% | 83% | 88% | 93% | |
46.1% | 40% | 60% | 30% | 62% | 62% | 43% | 57% | 68% | 57% | 48% | 70% | 48% | 33% | 43% | 37% | 55% | 30% | 25% | 28% | 25% | 30% | 45% | 28% | 33% | 62% | 38% | 78% | 55% | |
34.4% | 35% | 40% | 40% | 37% | 40% | 35% | 40% | 40% | 62% | 23% | 48% | 22% | 17% | 40% | 35% | 37% | 32% | 20% | 18% | 17% | 20% | 37% | 37% | 12% | 42% | 40% | 40% | 58% | |
78.7% | 87% | 83% | 60% | 85% | 85% | 83% | 55% | 83% | 78% | 87% | 100% | 73% | 67% | 68% | 62% | 83% | 70% | 85% | 60% | 73% | 80% | 95% | 90% | 57% | 90% | 88% | 85% | 93% | |
68.5% | 57% | 67% | 42% | 80% | 73% | 58% | 77% | 67% | 80% | 78% | 92% | 43% | 58% | 47% | 48% | 63% | 85% | 73% | 58% | 37% | 52% | 92% | 90% | 55% | 90% | 78% | 97% | 80% | |
76.9% | 62% | 67% | 70% | 70% | 87% | 79% | 73% | 82% | 72% | 87% | 88% | 70% | 67% | 67% | 68% | 78% | 83% | 72% | 78% | 65% | 73% | 95% | 82% | 75% | 78% | 88% | 83% | 95% | |
41.4% | 45% | 50% | 35% | 43% | 40% | 38% | 35% | 47% | 45% | 53% | 75% | 25% | 17% | 28% | 23% | 32% | 57% | 37% | 28% | 35% | 33% | 42% | 62% | 10% | 30% | 77% | 48% | 68% | |
45.2% | 42% | 43% | 52% | 38% | 45% | 38% | 47% | 53% | 52% | 53% | 52% | 37% | 2% | 40% | 45% | 52% | 37% | 48% | 43% | 45% | 42% | 57% | 55% | 45% | 47% | 53% | 47% | 55% | |
69.1% | 65% | 71% | 54% | 94% | 90% | 49% | 56% | 69% | 71% | 74% | 96% | 54% | 59% | 52% | 47% | 71% | 81% | 64% | 63% | 56% | 57% | 78% | 75% | 57% | 85% | 75% | 72% | 100% |