Distributional Prevalence Concordance (labels+tags)

Minimal blueprint to probe whether model outputs reflect specified real-world prevalence for underspecified scenarios. Uses simple weighted matches (no JS) on a structured tag line appended to each story.

TAGS:

Pluralism

Bias

Distributional

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Creative Writing

General Knowledge

Best Models (Coverage across 10 temperatures)

1.Gemma 3 12b It
71.0%
2.Claude 3.7 Sonnet
43.3%
3.Gemini 2.5 Flash
39.0%
4.Claude Sonnet 4
32.7%
5.Llama 3 70b Instruct
21.5%

View Blueprint

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3.5 Sonnet	Claude 3.7 Sonnet	Claude 3.5 Haiku	Claude Sonnet 4	Gemini 2.5 Flash	Gemma 3 12b It	Llama 3 70b Instruct	Llama 4 Maverick	Mistral Large 2411	Mistral Medium 3	Mistral Nemo	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	GPT OSS 120b	GPT OSS 20b	GLM 4.5	Qwen3 30b A3B Instruct 2507	Qwen3 32b
Score	8th 15.0%	2nd 43.3%	6th 20.4%	4th 32.7%	3rd 39.0%	1st 71.0%	5th 21.6%	11th 13.1%	14th 9.0%	14th 9.0%	7th 19.2%	14th 9.0%	10th 13.3%	13th 9.2%	14th 9.0%	14th 9.0%	20th 8.5%	14th 9.0%	21st 7.6%	12th 10.0%	9th 14.9%
17.6%	25%	27%	19%	19%	33%	69%	13%	25%	9%	9%	27%	9%	11%	10%	9%	9%	8%	9%	9%	9%	9%
14.1%	9%	17%	25%	9%	27%	60%	19%	9%	9%	9%	9%	9%	14%	9%	9%	9%	9%	9%	8%	9%	9%
28.2%	17%	79%	25%	87%	73%	87%	40%	9%	9%	9%	32%	9%	17%	9%	9%	9%	9%	9%	8%	13%	33%
15.1%	9%	50%	14%	16%	23%	68%	14%	9%	9%	9%	9%	9%	11%	9%	9%	9%	8%	9%	5%	9%	9%