Distributional Prevalence Concordance (labels+tags)

Minimal blueprint to probe whether model outputs reflect specified real-world prevalence for underspecified scenarios. Uses simple weighted matches (no JS) on a structured tag line appended to each story.

TAGS:

Pluralism

Bias

Distributional

Best Models (Coverage across 5 temperatures)

1.Gemma 3 12b It
62.4%
2.Claude 3.7 Sonnet
42.5%
3.Llama 4 Maverick
40.3%
4.Claude 3.5 Sonnet
37.5%
5.Claude 3.5 Haiku
37.5%

View Blueprint

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3.5 Sonnet	Claude 3.7 Sonnet	Claude 3.5 Haiku	Claude Sonnet 4	Gemini 2.5 Flash	Gemma 3 12b It	Llama 3 70b Instruct	Llama 4 Maverick	Mistral Large 2411	Mistral Medium 3	Mistral Nemo	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	GPT OSS 120b	GPT OSS 20b	GLM 4.5	Qwen3 30b A3B Instruct 2507	Qwen3 32b
Score	4th 37.5%	2nd 42.5%	4th 37.5%	7th 33.8%	11th 27.6%	1st 62.4%	4th 37.5%	3rd 40.3%	13th 22.4%	10th 30.6%	12th 24.4%	20th 0.0%	16th 7.2%	20th 0.0%	18th 1.5%	14th 20.9%	17th 4.3%	19th 0.5%	15th 12.9%	8th 33.3%	9th 33.1%
7.9%	10%	10%	10%	10%	10%	26%	10%	10%	6%	10%	24%	0%	6%	0%	6%	4%	0%	2%	0%	8%	4%
15.1%	29%	29%	29%	29%	23%	63%	29%	17%	0%	12%	0%	0%	6%	0%	0%	6%	0%	0%	0%	29%	17%
57.7%	86%	86%	86%	86%	57%	86%	86%	89%	69%	86%	69%	0%	17%	0%	0%	69%	17%	0%	52%	86%	86%
16.4%	25%	45%	25%	10%	20%	75%	25%	45%	15%	15%	5%	0%	0%	0%	0%	5%	0%	0%	0%	10%	25%