IPCC AR6 Synthesis Report: Summary for Policymakers

Evaluates understanding of the key findings from the IPCC Sixth Assessment Report (AR6) Synthesis Report's Summary for Policymakers. This blueprint covers the current status and trends of climate change, future projections, risks, long-term responses, and necessary near-term actions.

TAGS:

Climate Change

Ipcc

Factual Accuracy & Hallucination

Instruction Following & Prompt Adherence

General Knowledge

Reasoning

Summarization

Best Models (Coverage)

1.Grok 4
73.2%
2.Gemini 2.5 Flash
62.9%
3.Grok 3 Mini
62.8%
4.Grok 3
62.5%
5.Deepseek Chat V3
62.0%

🔀 Least Similar Models

Gemini 2.5 Pro Preview 05 06vsO4 Mini

79.6% similarity

👯 Most Similar Models

Grok 3 MinivsGrok 4

93.7% similarity

View Blueprint

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3.5 Haiku	Claude Sonnet 4	Command A	Deepseek Chat V3	Deepseek R1	Gemini 2.5 Flash	Gemini 2.5 Pro Preview 05 06	Mistral Large 2411	Mistral Medium 3	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	O4 Mini	Grok 3	Grok 3 Mini	Grok 4
Score	17th 45.1%	11th 53.2%	10th 56.3%	5th 62.0%	6th 61.5%	2nd 62.9%	18th 44.1%	14th 47.7%	8th 59.6%	7th 61.0%	12th 52.3%	13th 48.9%	16th 45.9%	15th 46.1%	9th 59.6%	4th 62.5%	3rd 62.8%	1st 73.2%
78.1%	40%	78%	78%	80%	78%	75%	60%	83%	78%	83%	78%	83%	80%	78%	75%	90%	93%	95%
35.1%	30%	23%	38%	30%	38%	30%	40%	35%	38%	35%	35%	30%	33%	18%	35%	40%	43%	60%
66.6%	50%	46%	71%	77%	77%	88%	0%	54%	71%	79%	75%	60%	56%	60%	92%	83%	60%	100%
70.2%	55%	73%	58%	68%	53%	83%	78%	63%	65%	75%	75%	55%	60%	65%	85%	85%	78%	90%
58.3%	55%	60%	58%	58%	65%	40%	60%	55%	58%	60%	68%	68%	55%	53%	55%	60%	58%	63%
64.6%	63%	61%	77%	68%	79%	72%	39%	61%	70%	70%	68%	66%	63%	52%	20%	75%	77%	82%
54.2%	46%	52%	50%	48%	83%	67%	25%	31%	63%	75%	60%	48%	33%	40%	58%	56%	58%	83%
52.4%	40%	60%	50%	53%	50%	80%	40%	50%	53%	60%	35%	35%	30%	33%		48%	75%	98%
54.8%	38%	50%	55%	73%	65%	60%	60%	48%	70%	63%	43%	45%	48%	60%	58%	30%	55%	65%
51.2%	50%	45%	58%	53%	68%	58%	50%	43%	48%	55%	58%	55%	43%	38%	33%	53%	63%
67.9%	50%	71%	67%	83%	83%	79%	27%	58%	75%	65%	58%	67%	52%	54%	83%	90%	79%	81%
35.9%	30%	30%	43%	43%	53%	50%	25%	35%	43%	35%	23%	30%	30%	28%	30%	48%	43%	28%
32.1%	40%	33%	40%	40%	40%	25%	18%	23%	43%	38%	15%	20%	15%	10%	40%	40%	40%	58%
65.3%	50%	78%	60%	75%	55%	80%	60%	58%	75%	55%	53%	50%	58%	58%	63%	80%	73%	95%
64.4%	38%	68%	68%	73%	68%	60%	75%	48%	68%	73%	70%	53%	48%	50%	88%	73%	70%	68%
74.3%	65%	68%	80%	90%	68%	85%	55%	70%	80%	78%	70%	73%	65%	70%	83%	85%	78%
27.6%	23%	20%	25%	58%	40%	35%	8%	10%	25%	48%	20%	15%	20%	15%	45%	40%	20%	30%
40.9%	40%	43%	38%	45%	48%	45%	38%	30%	40%	33%	35%	35%	25%	43%	58%	43%	45%	53%
65.1%	54%	52%	56%	63%	58%	83%	79%	52%	69%	79%	55%	42%	58%	50%	71%	69%	85%	96%

Model Similarity Dendrogram

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.