Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates model knowledge of the Universal Declaration of Human Rights (UDHR). Prompts cover the Preamble and key articles on fundamental rights (e.g., life, liberty, equality, privacy, expression). Includes a scenario to test reasoning on balancing competing rights.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Opus 4 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | O4 Mini | Kimi K2 Instruct | Grok 3 | Grok 3 Mini | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 14th 94.3% | 8th 96.5% | 16th 93.6% | 12th 94.6% | 15th 93.9% | 12th 94.6% | 6th 97.1% | 1st 98.9% | 20th 86.9% | 23rd 82.1% | 18th 91.3% | 7th 96.7% | 10th 95.3% | 2nd 98.6% | 11th 94.8% | 22nd 84.7% | 17th 92.9% | 21st 85.5% | 3rd 98.2% | 19th 90.7% | 4th 98.2% | 5th 98.1% | 9th 95.8% | |
96.3% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 94% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 72% | 100% | 100% | 100% | |
98.4% | 100% | 100% | 100% | 100% | 98% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 98% | 100% | 98% | 85% | 100% | 85% | 100% | 100% | 100% | 100% | 100% | |
94.4% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 96% | 0% | 100% | 100% | 100% | 100% | 98% | 81% | 100% | 98% | 100% | 98% | 100% | 100% | 100% | |
99.0% | 92% | 100% | 100% | 100% | 100% | 100% | 97% | 100% | 100% | 88% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
94.9% | 97% | 100% | 89% | 98% | 89% | 100% | 100% | 100% | 72% | 100% | 98% | 100% | 100% | 100% | 97% | 83% | 89% | 72% | 100% | 100% | 98% | 100% | 100% | |
95.4% | 98% | 100% | 93% | 100% | 90% | 95% | 100% | 100% | 100% | 95% | 95% | 100% | 100% | 100% | 95% | 80% | 95% | 63% | 100% | 95% | 100% | 100% | 100% | |
93.2% | 100% | 100% | 100% | 85% | 99% | 100% | 100% | 100% | 74% | 57% | 100% | 89% | 97% | 100% | 90% | 88% | 100% | 65% | 100% | 100% | 100% | 100% | 100% | |
100.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
100.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
95.7% | 100% | 100% | 100% | 100% | 78% | 97% | 100% | 100% | 78% | 100% | 100% | 97% | 100% | 100% | 100% | 100% | 72% | 100% | 100% | 78% | 100% | 100% | 100% | |
71.7% | 75% | 77% | 68% | 81% | 84% | 79% | 80% | 93% | 57% | 75% | 50% | 85% | 63% | 83% | 59% | 47% | 68% | 68% | 80% | 53% | 85% | 77% | 61% | |
84.4% | 70% | 81% | 73% | 71% | 89% | 89% | 88% | 94% | 72% | 70% | 53% | 89% | 86% | 100% | 100% | 77% | 91% | 75% | 98% | 92% | 95% | 100% | 88% |