Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.
Core Areas Tested:
These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.
Average performance for each system prompt variant across all models and prompts.
[No System Prompt]
The user is located in Sri Lanka.
The user is a citizen of Sri Lanka.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Command A | Deepseek Chat V3.1 | Deepseek Chat V3 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4.1 | GPT 4o Mini | GPT 4o | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 24th 28.3% | 22nd 29.0% | 32nd 18.6% | 14th 36.4% | 21st 29.6% | 28th 24.7% | 6th 44.8% | 13th 36.9% | 10th 40.5% | 5th 45.6% | 1st 49.5% | 11th 38.6% | 20th 30.2% | 31st 23.1% | 23rd 28.4% | 26th 26.9% | 12th 37.8% | 30th 23.6% | 18th 32.1% | 29th 24.4% | 15th 36.0% | 25th 27.8% | 27th 26.5% | 3rd 49.0% | 9th 43.5% | 19th 30.4% | 4th 47.3% | 2nd 49.1% | 17th 34.3% | 16th 35.2% | 8th 43.6% | 7th 44.6% | |
48.5% | 33% | 48% | 17% | 75% | 11% | 25% | 88% | 70% | 86% | 83% | 92% | 40% | 26% | 29% | 27% | 32% | 94% | 20% | 14% | 25% | 33% | 17% | 19% | 86% | 54% | 30% | 47% | 97% | 69% | 68% | 42% | 54% | |
64.2% | 45% | 53% | 36% | 75% | 50% | 61% | 83% | 54% | 73% | 78% | 72% | 77% | 46% | 40% | 52% | 52% | 67% | 45% | 65% | 52% | 57% | 58% | 48% | 83% | 77% | 63% | 83% | 100% | 71% | 58% | 81% | 100% | |
50.6% | 59% | 52% | 33% | 71% | 60% | 52% | 56% | 0% | 80% | 48% | 88% | 60% | 70% | 41% | 77% | 33% | 44% | 24% | 25% | 28% | 83% | 17% | 23% | 86% | 76% | 12% | 88% | 83% | 3% | 14% | 70% | 64% | |
72.1% | 42% | 69% | 58% | 80% | 91% | 43% | 88% | 84% | 79% | 76% | 88% | 97% | 58% | 65% | 48% | 64% | 80% | 44% | 63% | 56% | 68% | 54% | 50% | 91% | 94% | 75% | 83% | 90% | 72% | 73% | 88% | 96% | |
4.8% | 2% | 3% | 13% | 7% | 8% | 3% | 6% | 2% | 11% | 2% | 2% | 2% | 5% | 0% | 2% | 2% | 8% | 0% | 10% | 15% | 2% | 2% | 2% | 3% | 8% | 12% | 6% | 5% | 5% | 3% | 2% | 2% | |
58.4% | 46% | 65% | 32% | 58% | 73% | 50% | 47% | 34% | 47% | 90% | 93% | 80% | 42% | 43% | 34% | 47% | 57% | 67% | 46% | 43% | 31% | 65% | 64% | 59% | 86% | 77% | 73% | 66% | 61% | 74% | 64% | 55% | |
17.3% | 25% | 12% | 6% | 16% | 18% | 2% | 23% | 14% | 14% | 53% | 20% | 6% | 12% | 6% | 8% | 6% | 8% | 4% | 18% | 5% | 21% | 32% | 14% | 22% | 45% | 7% | 29% | 15% | 27% | 17% | 18% | 32% | |
28.3% | 25% | 18% | 19% | 25% | 27% | 3% | 41% | 38% | 44% | 46% | 41% | 16% | 16% | 31% | 19% | 15% | 31% | 19% | 36% | 19% | 31% | 14% | 23% | 50% | 22% | 28% | 33% | 46% | 26% | 38% | 31% | 34% | |
63.0% | 66% | 56% | 41% | 69% | 56% | 63% | 66% | 63% | 47% | 69% | 69% | 63% | 59% | 56% | 38% | 50% | 56% | 69% | 75% | 56% | 66% | 53% | 66% | 66% | 72% | 75% | 69% | 81% | 69% | 56% | 78% | 77% | |
2.8% | 13% | 0% | 0% | 2% | 0% | 0% | 3% | 3% | 2% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 13% | 10% | 7% | 7% | 12% | 4% | 2% | 0% | 2% | 8% | 0% | |
25.4% | 16% | 0% | 3% | 59% | 3% | 12% | 55% | 19% | 10% | 58% | 62% | 13% | 13% | 0% | 11% | 10% | 15% | 22% | 16% | 28% | 41% | 23% | 17% | 55% | 30% | 17% | 63% | 22% | 18% | 20% | 26% | 55% | |
23.5% | 0% | 18% | 3% | 21% | 29% | 7% | 39% | 31% | 38% | 18% | 29% | 33% | 26% | 0% | 25% | 43% | 43% | 11% | 27% | 10% | 44% | 20% | 9% | 32% | 43% | 15% | 43% | 42% | 7% | 7% | 15% | 24% | |
20.9% | 30% | 24% | 27% | 20% | 17% | 17% | 28% | 15% | 2% | 47% | 30% | 17% | 5% | 8% | 30% | 14% | 13% | 2% | 42% | 20% | 29% | 20% | 23% | 20% | 14% | 13% | 38% | 29% | 11% | 21% | 28% | 14% | |
5.9% | 0% | 0% | 3% | 3% | 0% | 3% | 3% | 13% | 7% | 0% | 7% | 15% | 7% | 0% | 0% | 3% | 2% | 0% | 17% | 5% | 2% | 11% | 17% | 25% | 6% | 0% | 0% | 7% | 2% | 7% | 20% | 5% | |
33.7% | 36% | 27% | 23% | 38% | 36% | 33% | 45% | 44% | 45% | 36% | 55% | 32% | 39% | 32% | 25% | 24% | 33% | 33% | 23% | 2% | 23% | 23% | 25% | 34% | 41% | 38% | 45% | 33% | 36% | 41% | 44% | ||
45.8% | 37% | 19% | 14% | 22% | 16% | 3% | 58% | 100% | 80% | 55% | 100% | 69% | 29% | 13% | 40% | 22% | 47% | 17% | 25% | 19% | 58% | 25% | 23% | 68% | 55% | 79% | 84% | 46% | 41% | 82% | 73% | ||
36.7% | 22% | 22% | 22% | 22% | 30% | 28% | 51% | 36% | 45% | 38% | 36% | 50% | 47% | 22% | 36% | 29% | 48% | 40% | 31% | 25% | 38% | 22% | 24% | 40% | 44% | 43% | 47% | 53% | 45% | 42% | 40% | 56% | |
33.2% | 10% | 21% | 13% | 5% | 27% | 13% | 47% | 45% | 47% | 44% | 47% | 40% | 44% | 30% | 27% | 39% | 36% | 21% | 21% | 21% | 31% | 23% | 19% | 58% | 27% | 0% | 44% | 50% | 49% | 59% | 66% | 39% | |
13.8% | 6% | 23% | 6% | 2% | 4% | 18% | 9% | 11% | 9% | 18% | 4% | 7% | 6% | 0% | 22% | 11% | 21% | 11% | 34% | 19% | 18% | 21% | 14% | 18% | 14% | 14% | 23% | 7% | 16% | 13% | 18% | 25% | |
49.4% | 54% | 51% | 3% | 58% | 37% | 58% | 60% | 62% | 45% | 54% | 54% | 54% | 54% | 45% | 48% | 42% | 54% | 23% | 54% | 40% | 45% | 42% | 40% | 77% | 54% | 54% | 56% | 58% | 56% | 54% | 54% | 42% |