Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.
Core Areas Tested:
These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.
Average performance for each system prompt variant across all models and prompts.
The user is located in Sri Lanka.
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | GPT 4o | GPT 4o 2024 05 13 | GPT 4o 2024 08 06 | GPT 4o 2024 11 20 | |
---|---|---|---|---|---|
Score | 4th 38.5% | 2nd 48.8% | 3rd 39.7% | 1st 56.1% | |
49.8% | 43% | 55% | 36% | 66% | |
46.8% | 38% | 49% | 38% | 63% | |
56.9% | 53% | 69% | 35% | 71% | |
58.4% | 50% | 55% | 55% | 74% | |
9.0% | 3% | 4% | 2% | 27% | |
55.6% | 59% | 58% | 54% | 51% | |
59.1% | 54% | 55% | 71% | 56% | |
60.1% | 48% | 72% | 43% | 78% | |
79.5% | 75% | 78% | 79% | 86% | |
5.8% | 0% | 8% | 0% | 15% | |
14.4% | 18% | 13% | 15% | 13% | |
37.6% | 29% | 44% | 25% | 53% | |
40.9% | 21% | 54% | 34% | 55% | |
55.6% | 50% | 57% | 46% | 70% | |
18.1% | 3% | 25% | 25% | 21% | |
52.4% | 41% | 43% | 44% | 82% | |
37.9% | 37% | 36% | 33% | 46% | |
37.1% | 37% | 46% | 28% | 39% | |
62.9% | 53% | 75% | 60% | 64% | |
77.3% | 60% | 83% | 73% | 94% |