Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.
Core Areas Tested:
Average performance for each system prompt variant across all models and prompts.
The user is located in Sri Lanka.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Haiku | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Gemini 2.5 Flash | Gemini 2.5 Pro Preview 05 06 | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | Grok 3 Mini | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 12th 46.7% | 7th 54.0% | 8th 53.6% | 4th 58.3% | 1st 70.0% | 5th 55.1% | 6th 55.1% | 9th 48.8% | 2nd 64.6% | 10th 48.7% | 13th 45.1% | 10th 48.7% | 14th 40.4% | 3rd 64.2% | |
62.7% | 49% | 68% | 61% | 100% | 85% | 81% | 71% | 78% | 68% | 64% | 40% | 19% | 40% | 54% | |
67.5% | 60% | 66% | 71% | 84% | 77% | 35% | 68% | 65% | 65% | 69% | 67% | 59% | 74% | 85% | |
59.9% | 61% | 76% | 54% | 0% | 81% | 88% | 45% | 72% | 84% | 27% | 65% | 65% | 44% | 77% | |
65.6% | 63% | 66% | 80% | 30% | 79% | 78% | 69% | 71% | 72% | 66% | 56% | 47% | 64% | 78% | |
26.5% | 19% | 43% | 47% | 17% | 42% | 23% | 61% | 17% | 12% | 12% | 14% | 12% | 8% | 44% | |
37.1% | 38% | 36% | 28% | 42% | 30% | 55% | 43% | 42% | 32% | 42% | 28% | 44% | 24% | 35% | |
60.2% | 45% | 71% | 54% | 63% | 75% | 68% | 42% | 50% | 83% | 54% | 52% | 63% | 53% | 70% | |
65.2% | 63% | 73% | 89% | 82% | 71% | 14% | 55% | 64% | 73% | 68% | 68% | 67% | 48% | 78% | |
74.7% | 57% | 71% | 86% | 86% | 93% | 29% | 72% | 68% | 82% | 89% | 77% | 82% | 72% | 82% | |
16.4% | 15% | 15% | 17% | 33% | 4% | 39% | 17% | 17% | 14% | 15% | 6% | 0% | 15% | 23% | |
36.5% | 40% | 44% | 31% | 26% | 77% | 23% | 4% | 34% | 38% | 27% | 39% | 38% | 25% | 65% | |
45.8% | 42% | 54% | 48% | 23% | 96% | 18% | 42% | 33% | 75% | 31% | 38% | 64% | 35% | 42% | |
47.3% | 42% | 45% | 36% | 53% | 75% | 50% | 47% | 42% | 79% | 42% | 39% | 30% | 35% | 47% | |
67.7% | 65% | 62% | 41% | 85% | 88% | 77% | 77% | 35% | 85% | 73% | 73% | 65% | 35% | 87% | |
54.9% | 55% | 47% | 88% | 53% | 91% | 80% | 28% | 71% | 51% | 21% | 47% | 22% | 60% | ||
61.9% | 43% | 50% | 80% | 97% | 73% | 83% | 63% | 44% | 72% | 53% | 33% | 49% | 54% | 72% | |
49.6% | 42% | 34% | 52% | 58% | 56% | 63% | 55% | 54% | 49% | 42% | 42% | 47% | 31% | 70% | |
55.8% | 50% | 55% | 43% | 80% | 80% | 51% | 63% | 49% | 63% | 60% | 36% | 48% | 27% | 76% | |
55.2% | 46% | 58% | 43% | 58% | 85% | 40% | 46% | 58% | 75% | 52% | 43% | 58% | 46% | 65% | |
65.4% | 38% | 46% | 58% | 60% | 81% | 96% | 81% | 54% | 100% | 37% | 65% | 70% | 57% | 73% |