Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Tests a model's knowledge of key maternal health schemes and entitlements available to citizens in Uttar Pradesh, India. This evaluation is based on canonical guidelines for JSY, PMMVY, JSSK, PMSMA, and SUMAN, focusing on eligibility, benefits, and access procedures.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek Chat V3.1 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 11th 55.1% | 21st 47.2% | 26th 44.5% | 5th 61.6% | 10th 56.1% | 18th 50.5% | 8th 57.8% | 7th 58.6% | 6th 61.3% | 15th 53.3% | 4th 61.8% | 28th 41.3% | 24th 45.2% | 30th 36.3% | 27th 43.9% | 20th 49.2% | 12th 54.8% | 29th 37.1% | 9th 57.2% | 14th 54.1% | 22nd 47.0% | 19th 49.3% | 32nd 34.8% | 3rd 61.9% | 25th 44.7% | 31st 35.1% | 16th 53.3% | 13th 54.2% | 17th 51.6% | 23rd 46.8% | 2nd 63.1% | 1st 65.1% | |
92.6% | 94% | 92% | 86% | 100% | 100% | 98% | 100% | 95% | 100% | 98% | 100% | 86% | 89% | 88% | 82% | 92% | 94% | 71% | 100% | 98% | 98% | 98% | 77% | 100% | 94% | 74% | 100% | 90% | 77% | 97% | 100% | 100% | |
58.0% | 59% | 45% | 38% | 75% | 89% | 61% | 77% | 90% | 77% | 56% | 60% | 54% | 50% | 50% | 48% | 45% | 60% | 50% | 60% | 42% | 40% | 43% | 40% | 68% | 53% | 40% | 50% | 60% | 64% | 59% | 77% | 77% | |
29.5% | 33% | 34% | 27% | 19% | 18% | 13% | 38% | 34% | 34% | 38% | 55% | 26% | 13% | 10% | 13% | 33% | 30% | 16% | 30% | 31% | 26% | 28% | 16% | 60% | 39% | 13% | 33% | 42% | 19% | 28% | 52% | 50% | |
29.2% | 36% | 6% | 22% | 35% | 39% | 41% | 31% | 38% | 31% | 31% | 31% | 16% | 36% | 26% | 44% | 38% | 45% | 19% | 35% | 35% | 19% | 22% | 6% | 27% | 7% | 12% | 38% | 44% | 36% | 27% | 31% | 35% | |
77.9% | 86% | 78% | 78% | 97% | 75% | 78% | 81% | 83% | 94% | 86% | 94% | 61% | 77% | 38% | 72% | 82% | 83% | 63% | 97% | 88% | 78% | 86% | 53% | 91% | 63% | 50% | 91% | 83% | 83% | 55% | 90% | 85% | |
19.1% | 22% | 28% | 18% | 44% | 16% | 13% | 21% | 13% | 33% | 12% | 32% | 6% | 7% | 7% | 5% | 7% | 18% | 5% | 22% | 32% | 22% | 19% | 16% | 27% | 13% | 22% | 8% | 7% | 30% | 16% | 30% | 44% |