Distributional Prevalence Concordance (labels+tags)

Minimal blueprint to probe whether model outputs reflect specified real-world prevalence for underspecified scenarios. Uses simple weighted matches (no JS) on a structured tag line appended to each story.

TAGS:

Pluralism

Bias

Distributional

Best Models (Coverage across 5 temperatures)

1.Gemma 3 12b It
65.9%
2.Claude 3.7 Sonnet
47.7%
3.Llama 4 Maverick
45.7%
4.Claude 3.5 Sonnet
43.2%
5.Claude 3.5 Haiku
43.2%

View Blueprint

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3.5 Sonnet	Claude 3.7 Sonnet	Claude 3.5 Haiku	Claude Sonnet 4	Gemini 2.5 Flash	Gemma 3 12b It	Llama 3 70b Instruct	Llama 4 Maverick	Mistral Large 2411	Mistral Medium 3	Mistral Nemo	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	GPT OSS 120b	GPT OSS 20b	GLM 4.5	Qwen3 30b A3B Instruct 2507	Qwen3 32b
Score	4th 43.3%	2nd 47.7%	4th 43.3%	7th 39.8%	11th 33.4%	1st 65.9%	6th 41.0%	3rd 45.8%	13th 29.1%	10th 37.0%	12th 30.4%	20th 9.0%	16th 14.8%	20th 9.0%	18th 10.0%	14th 28.0%	17th 12.9%	19th 9.5%	15th 19.8%	8th 39.3%	9th 38.6%
16.7%	19%	19%	19%	19%	19%	33%	19%	19%	15%	19%	31%	9%	15%	9%	15%	13%	9%	11%	7%	17%	13%
22.4%	35%	35%	35%	35%	30%	66%	33%	25%	7%	19%	9%	9%	12%	9%	9%	14%	9%	9%	9%	35%	25%
60.9%	87%	87%	87%	87%	57%	87%	85%	90%	71%	87%	68%	9%	23%	9%	9%	71%	25%	9%	56%	87%	87%
23.5%	32%	50%	32%	18%	27%	77%	27%	50%	23%	23%	14%	9%	9%	9%	7%	14%	9%	9%	7%	18%	30%