Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.
Core Areas Tested:
These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.
Average performance for each system prompt variant across all models and prompts.
[No System Prompt]
The user is located in Sri Lanka.
The user is a citizen of Sri Lanka.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
Prompts vs. Models | Claude 3 5 Sonnet | Claude 3 7 Sonnet | Claude 3.5 Haiku | Claude Opus 4 | Claude Sonnet 4 | Command A | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4.1 | GPT 4o Mini | GPT 4o | GPT Oss 120b | GPT Oss 20b | O4 Mini | Grok 3 Mini | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 22nd 28.6% | 16th 32.4% | 26th 23.3% | 12th 35.0% | 13th 34.4% | 24th 25.5% | 11th 37.8% | 6th 42.0% | 5th 42.3% | 1st 51.3% | 17th 31.4% | 23rd 28.4% | 20th 30.1% | 15th 33.2% | 9th 39.0% | 14th 33.8% | 19th 30.3% | 10th 38.2% | 25th 23.4% | 21st 29.8% | 7th 40.7% | 18th 30.9% | 4th 44.7% | 8th 39.0% | 3rd 46.0% | 2nd 49.8% | |
62.3% | 54% | 74% | 55% | 77% | 33% | 30% | 88% | 82% | 84% | 94% | 65% | 53% | 46% | 68% | 75% | 51% | 51% | 51% | 23% | 33% | 69% | 49% | 76% | 58% | 86% | 95% | |
57.0% | 34% | 62% | 43% | 45% | 67% | 54% | 77% | 75% | 54% | 93% | 31% | 39% | 45% | 39% | 67% | 49% | 33% | 52% | 38% | 53% | 79% | 56% | 76% | 67% | 77% | 77% | |
62.2% | 79% | 78% | 50% | 80% | 84% | 0% | 0% | 78% | 79% | 99% | 76% | 61% | 71% | 67% | 86% | 31% | 30% | 70% | 40% | 77% | 54% | 4% | 83% | 70% | 80% | 90% | |
68.8% | 57% | 66% | 59% | 71% | 82% | 57% | 82% | 71% | 68% | 90% | 46% | 65% | 68% | 55% | 69% | 57% | 60% | 65% | 66% | 57% | 91% | 62% | 71% | 75% | 84% | 96% | |
3.9% | 1% | 3% | 1% | 11% | 3% | 2% | 6% | 6% | 2% | 1% | 2% | 1% | 3% | 2% | 13% | 3% | 3% | 3% | 1% | 6% | 8% | 3% | 4% | 2% | 9% | 2% | |
59.9% | 51% | 66% | 43% | 45% | 73% | 41% | 58% | 77% | 84% | 83% | 49% | 55% | 35% | 71% | 58% | 51% | 55% | 69% | 51% | 60% | 60% | 65% | 64% | 82% | 58% | 53% | |
20.5% | 17% | 8% | 15% | 27% | 15% | 8% | 17% | 14% | 39% | 20% | 17% | 12% | 5% | 10% | 8% | 33% | 35% | 35% | 8% | 15% | 31% | 26% | 53% | 17% | 27% | 20% | |
31.5% | 29% | 27% | 24% | 39% | 29% | 2% | 36% | 44% | 29% | 41% | 20% | 43% | 20% | 23% | 26% | 33% | 19% | 44% | 22% | 30% | 42% | 27% | 37% | 45% | 43% | 44% | |
65.9% | 60% | 58% | 50% | 64% | 67% | 60% | 74% | 64% | 68% | 72% | 54% | 56% | 54% | 68% | 74% | 70% | 69% | 74% | 52% | 66% | 76% | 76% | 74% | 66% | 76% | 71% | |
0.2% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 1% | 0% | 0% | 0% | 0% | 1% | 0% | 0% | 0% | 0% | 0% | 0% | 2% | 0% | 0% | 0% | 1% | 0% | |
21.6% | 12% | 0% | 11% | 19% | 0% | 43% | 25% | 2% | 31% | 67% | 6% | 0% | 8% | 10% | 4% | 16% | 17% | 40% | 15% | 13% | 14% | 21% | 47% | 23% | 42% | 75% | |
18.9% | 16% | 10% | 7% | 27% | 18% | 9% | 20% | 23% | 18% | 28% | 34% | 2% | 24% | 27% | 21% | 33% | 18% | 39% | 7% | 5% | 18% | 0% | 27% | 20% | 23% | 17% | |
25.5% | 33% | 17% | 32% | 29% | 25% | 12% | 21% | 13% | 36% | 30% | 24% | 11% | 33% | 8% | 22% | 32% | 39% | 31% | 13% | 9% | 27% | 30% | 32% | 30% | 40% | 35% | |
7.7% | 7% | 2% | 11% | 7% | 9% | 11% | 8% | 9% | 0% | 20% | 0% | 1% | 0% | 0% | 17% | 18% | 18% | 2% | 6% | 7% | 2% | 2% | 0% | 11% | 12% | 19% | |
21.0% | 23% | 19% | 11% | 22% | 27% | 22% | 19% | 25% | 23% | 32% | 25% | 13% | 16% | 13% | 23% | 16% | 17% | 17% | 16% | 14% | 29% | 25% | 23% | 29% | 14% | 33% | |
54.8% | 25% | 47% | 18% | 51% | 48% | 54% | 91% | 95% | 75% | 98% | 42% | 45% | 49% | 52% | 68% | 53% | 33% | 51% | 22% | 39% | 58% | 53% | 62% | 50% | 55% | 91% | |
33.3% | 17% | 17% | 19% | 15% | 36% | 15% | 38% | 44% | 29% | 48% | 39% | 28% | 41% | 44% | 41% | 28% | 21% | 37% | 15% | 31% | 39% | 47% | 45% | 33% | 47% | 53% | |
28.8% | 7% | 24% | 1% | 25% | 26% | 8% | 29% | 48% | 48% | 46% | 42% | 22% | 22% | 40% | 32% | 26% | 22% | 24% | 5% | 19% | 41% | 0% | 50% | 31% | 66% | 44% | |
16.5% | 4% | 24% | 4% | 4% | 2% | 35% | 14% | 19% | 19% | 1% | 5% | 14% | 8% | 23% | 24% | 20% | 19% | 18% | 21% | 16% | 21% | 17% | 21% | 27% | 27% | 21% | |
48.5% | 45% | 46% | 12% | 42% | 45% | 47% | 53% | 50% | 58% | 63% | 51% | 47% | 55% | 42% | 51% | 55% | 47% | 42% | 47% | 47% | 54% | 54% | 49% | 44% | 54% | 60% |