Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.
Core Areas Tested:
These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.
Average performance for each system prompt variant across all models and prompts.
[No System Prompt]
The user is located in Sri Lanka.
The user is a citizen of Sri Lanka.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3.1 | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4.1 | GPT 4o Mini | GPT 4o | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 25th 29.2% | 16th 32.1% | 30th 23.8% | 14th 34.8% | 18th 31.9% | 23rd 29.7% | 2nd 48.9% | 12th 36.8% | 9th 41.9% | 7th 43.9% | 1st 51.4% | 24th 29.4% | 28th 27.7% | 26th 29.0% | 17th 31.9% | 11th 39.3% | 15th 34.1% | 27th 28.8% | 13th 35.8% | 29th 24.1% | 22nd 29.8% | 4th 48.7% | 10th 40.5% | 20th 31.0% | 6th 46.0% | 5th 46.2% | 21st 30.9% | 19th 31.4% | 8th 42.6% | 3rd 48.8% | |
57.5% | 35% | 75% | 32% | 71% | 34% | 21% | 92% | 87% | 75% | 81% | 94% | 42% | 42% | 28% | 66% | 75% | 41% | 31% | 38% | 25% | 22% | 74% | 77% | 43% | 75% | 85% | 69% | 52% | 70% | 74% | |
64.7% | 47% | 62% | 46% | 72% | 67% | 60% | 85% | 72% | 83% | 60% | 100% | 44% | 52% | 49% | 40% | 72% | 57% | 43% | 46% | 49% | 57% | 83% | 75% | 60% | 83% | 81% | 71% | 58% | 83% | 83% | |
55.1% | 67% | 67% | 48% | 75% | 79% | 0% | 85% | 0% | 75% | 65% | 95% | 67% | 41% | 71% | 63% | 83% | 32% | 27% | 64% | 33% | 70% | 70% | 49% | 6% | 84% | 85% | 0% | 0% | 65% | 88% | |
71.7% | 61% | 72% | 65% | 76% | 80% | 67% | 84% | 79% | 77% | 75% | 86% | 45% | 66% | 57% | 54% | 69% | 63% | 61% | 76% | 62% | 64% | 83% | 86% | 72% | 79% | 84% | 57% | 69% | 82% | 99% | |
2.7% | 2% | 3% | 0% | 7% | 3% | 3% | 0% | 7% | 3% | 0% | 2% | 2% | 3% | 3% | 2% | 3% | 3% | 3% | 3% | 0% | 3% | 5% | 2% | 2% | 7% | 0% | 0% | 2% | 3% | 5% | |
55.9% | 50% | 52% | 45% | 46% | 55% | 50% | 86% | 61% | 71% | 82% | 96% | 46% | 45% | 33% | 55% | 55% | 54% | 66% | 57% | 50% | 57% | 55% | 66% | 34% | 86% | 64% | 25% | 57% | 43% | 34% | |
20.6% | 21% | 5% | 18% | 33% | 8% | 14% | 38% | 20% | 22% | 32% | 7% | 12% | 18% | 11% | 6% | 9% | 28% | 22% | 26% | 13% | 15% | 33% | 35% | 25% | 30% | 24% | 17% | 25% | 22% | 30% | |
29.3% | 14% | 16% | 17% | 27% | 27% | 24% | 34% | 30% | 40% | 40% | 40% | 6% | 37% | 33% | 36% | 34% | 29% | 23% | 36% | 11% | 14% | 37% | 38% | 31% | 27% | 31% | 25% | 23% | 46% | 54% | |
64.5% | 66% | 55% | 44% | 77% | 55% | 72% | 74% | 74% | 54% | 64% | 74% | 59% | 50% | 38% | 56% | 74% | 75% | 67% | 69% | 58% | 64% | 63% | 64% | 78% | 69% | 75% | 63% | 72% | 74% | 59% | |
0.3% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 2% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 3% | 3% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | |
23.1% | 18% | 0% | 11% | 11% | 10% | 45% | 54% | 7% | 4% | 53% | 79% | 10% | 0% | 11% | 7% | 7% | 13% | 13% | 38% | 11% | 11% | 60% | 4% | 37% | 34% | 9% | 4% | 13% | 60% | 59% | |
17.8% | 8% | 10% | 7% | 27% | 27% | 0% | 34% | 0% | 25% | 16% | 27% | 20% | 2% | 13% | 28% | 33% | 35% | 13% | 44% | 7% | 7% | 34% | 13% | 0% | 22% | 25% | 0% | 10% | 20% | 27% | |
24.3% | 37% | 42% | 41% | 23% | 17% | 19% | 18% | 19% | 21% | 39% | 25% | 13% | 11% | 27% | 13% | 13% | 30% | 25% | 34% | 16% | 10% | 22% | 27% | 20% | 42% | 18% | 22% | 13% | 34% | 38% | |
7.4% | 7% | 0% | 13% | 3% | 0% | 7% | 7% | 7% | 23% | 0% | 7% | 0% | 2% | 0% | 7% | 7% | 20% | 10% | 20% | 0% | 7% | 25% | 3% | 5% | 0% | 17% | 3% | 3% | 10% | 10% | |
31.4% | 38% | 26% | 27% | 39% | 35% | 30% | 45% | 35% | 25% | 38% | 45% | 34% | 18% | 30% | 24% | 36% | 22% | 7% | 14% | 31% | 31% | 37% | 39% | 39% | 34% | 38% | 27% | 27% | 26% | 44% | |
57.7% | 26% | 40% | 18% | 36% | 43% | 61% | 79% | 91% | 97% | 71% | 100% | 54% | 42% | 51% | 53% | 63% | 55% | 31% | 40% | 30% | 40% | 69% | 62% | 59% | 63% | 100% | 77% | 53% | 48% | 79% | |
36.8% | 22% | 25% | 25% | 25% | 33% | 22% | 49% | 49% | 26% | 35% | 36% | 38% | 28% | 40% | 29% | 47% | 30% | 26% | 40% | 25% | 40% | 34% | 46% | 39% | 54% | 44% | 57% | 45% | 47% | 49% | |
31.3% | 5% | 21% | 2% | 0% | 19% | 21% | 44% | 31% | 48% | 45% | 44% | 42% | 23% | 22% | 37% | 27% | 19% | 22% | 19% | 5% | 19% | 73% | 38% | 0% | 50% | 60% | 38% | 25% | 67% | 72% | |
16.6% | 4% | 21% | 4% | 2% | 4% | 27% | 13% | 13% | 16% | 21% | 14% | 4% | 18% | 11% | 14% | 25% | 27% | 38% | 20% | 11% | 20% | 20% | 29% | 16% | 25% | 25% | 7% | 27% | 5% | 18% | |
51.3% | 56% | 50% | 12% | 45% | 42% | 51% | 56% | 54% | 51% | 62% | 58% | 51% | 55% | 53% | 48% | 54% | 48% | 48% | 31% | 44% | 44% | 94% | 54% | 54% | 56% | 58% | 56% | 53% | 48% | 54% |