Universal Declaration of Human Rights

Evaluates the models on the UDHR dataset (Universal Declaration of Human Rights).

TAGS:

🤔 Most Differentiating Prompt

User: Summarize the rights related to work, employment conditions, pay, and trade unions as specified in Article 23 of the UDHR.

σ = 0.103

🔀 Least Similar Models

Claude Sonnet 4vsGPT 4o

87.4% similarity

👯 Most Similar Models

GPT 4.1 NanovsGPT 4o

94.8% similarity

Select Prompt:

Average key point coverage extent for each model across all prompts.

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3.5 Haiku	Claude Sonnet 4	Command A	Deepseek Chat V3	Gemini 2.5 Flash	Mistral Large 2411	Mistral Medium 3	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini	Grok 3 Mini
Score	9th 94.4%	9th 94.4%	4th 97.1%	6th 96.1%	2nd 99.3%	5th 96.7%	7th 96.0%	3rd 98.4%	11th 92.9%	13th 88.2%	8th 94.8%	12th 88.4%	1st 99.3%
98.3%	100%	100%	100%	100%	100%	100%	100%	100%	100%	78%	100%	100%	100%
97.3%	100%	81%	100%	98%	100%	100%	98%	100%	100%	100%	100%	88%	100%
98.5%	100%	96%	100%	100%	100%	100%	98%	100%	100%	90%	100%	96%	100%
99.5%	95%	100%	100%	100%	100%	100%	100%	100%	100%	100%	98%	100%	100%
90.8%	84%	92%	100%	89%	100%	98%	100%	100%	72%	86%	88%	72%	100%
96.2%	98%	95%	100%	100%	100%	100%	100%	98%	98%	88%	93%	80%	100%
94.2%	91%	100%	100%	99%	100%	100%	91%	100%	90%	78%	100%	76%	100%
100.0%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
100.0%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%	100%
93.9%	100%	100%	100%	88%	100%	88%	88%	100%	88%	100%	81%	88%	100%
78.7%	77%	77%	79%	88%	91%	74%	82%	83%	70%	55%	80%	73%	94%
93.5%	88%	92%	86%	91%	100%	100%	95%	100%	97%	83%	98%	88%	98%

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.