Distributional Prevalence Concordance (labels+tags)

Minimal blueprint to probe whether model outputs reflect specified real-world prevalence for underspecified scenarios. Uses simple weighted matches (no JS) on a structured tag line appended to each story.

TAGS:

Pluralism

Bias

Distributional

AI Bias & Fairness

Instruction Following & Prompt Adherence

Creative Writing

General Knowledge

Role Playing

Best Models (Coverage across 10 temperatures)

1.Gemma 3 12b It
67.6%
2.Claude 3.7 Sonnet
48.8%
3.Gemini 2.5 Flash
46.5%
4.Claude Sonnet 4
31.0%
5.Qwen3 32b
30.1%

View Blueprint

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3.5 Sonnet	Claude 3.7 Sonnet	Claude 3.5 Haiku	Claude Sonnet 4	Gemini 2.5 Flash	Gemma 3 12b It	Llama 3 70b Instruct	Llama 4 Maverick	Mistral Large 2411	Mistral Medium 3	Mistral Nemo	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	GPT OSS 120b	GPT OSS 20b	GLM 4.5	Qwen3 30b A3B Instruct 2507	Qwen3 32b
Score	8th 19.2%	2nd 48.8%	10th 17.6%	4th 31.0%	3rd 46.5%	1st 67.6%	7th 23.3%	6th 29.5%	11th 9.0%	11th 9.0%	9th 18.7%	11th 9.0%	11th 9.0%	11th 9.0%	11th 9.0%	11th 9.0%	11th 9.0%	11th 9.0%	21st 7.9%	11th 9.0%	5th 30.1%
20.4%	50%	9%	19%	19%	55%	55%	14%	91%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%
15.2%	9%	22%	22%	9%	35%	74%	22%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%
31.6%	9%	87%	9%	87%	87%	87%	48%	9%	9%	9%	48%	9%	9%	9%	9%	9%	9%	9%	9%	9%	94%
14.7%	9%	77%	21%	9%	9%	55%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	9%	5%	9%	9%