Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates knowledge of the key legal frameworks, national programs, and intercultural health policies governing maternal and child health in Peru. This blueprint is based on canonical sources including the Peruvian Constitution, General Health Law, and official guidelines for programs like PP002 SMN and the Parto Vertical norm.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Opus 4 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | O4 Mini | Kimi K2 Instruct | Grok 3 | Grok 3 Mini | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 14th 61.3% | 12th 63.4% | 19th 54.8% | 9th 65.3% | 11th 64.4% | 22nd 53.4% | 10th 64.6% | 8th 65.9% | 3rd 74.3% | 2nd 79.4% | 18th 55.5% | 25th 36.5% | 24th 43.6% | 21st 54.0% | 13th 61.8% | 6th 69.6% | 17th 57.2% | 15th 59.3% | 20th 54.6% | 23rd 52.6% | 5th 70.6% | 16th 57.5% | 4th 70.7% | 7th 66.7% | 1st 80.4% | |
65.8% | 74% | 73% | 67% | 81% | 73% | 32% | 65% | 72% | 80% | 88% | 64% | 37% | 55% | 52% | 59% | 72% | 65% | 64% | 67% | 58% | 75% | 64% | 68% | 77% | ||
47.5% | 51% | 50% | 29% | 56% | 47% | 38% | 47% | 56% | 49% | 65% | 46% | 57% | 42% | 39% | 57% | 52% | 29% | 44% | 46% | 29% | 41% | 53% | 50% | 70% | ||
40.6% | 37% | 48% | 46% | 44% | 46% | 35% | 40% | 28% | 56% | 95% | 11% | 14% | 25% | 29% | 31% | 59% | 26% | 28% | 23% | 26% | 47% | 38% | 69% | 44% | 70% | |
57.2% | 66% | 53% | 51% | 63% | 65% | 54% | 59% | 67% | 76% | 61% | 53% | 16% | 49% | 52% | 63% | 60% | 51% | 55% | 49% | 45% | 65% | 64% | 63% | 72% | ||
78.4% | 73% | 73% | 72% | 75% | 74% | 91% | 90% | 88% | 95% | 93% | 65% | 55% | 0% | 78% | 91% | 90% | 76% | 88% | 67% | 79% | 87% | 93% | 93% | 97% | ||
73.2% | 63% | 69% | 57% | 67% | 72% | 59% | 81% | 81% | 88% | 81% | 70% | 39% | 62% | 76% | 62% | 84% | 82% | 79% | 71% | 62% | 93% | 84% | 81% | 93% | ||
69.1% | 64% | 77% | 62% | 72% | 74% | 65% | 71% | 69% | 76% | 74% | 79% | 39% | 72% | 52% | 70% | 70% | 71% | 58% | 60% | 70% | 86% | 77% | 67% | 67% | 84% |