Wevala Collective Intelligence Project+ Create

Loading analysis results...

Please wait while we prepare the detailed comparison.

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Analysis: Sri Lanka Citizen Compendium Factum - Run 9c4c137...

Sri Lanka Contextual Prompts

This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.

Core Areas Tested:

Ethnic Relations & Conflict: Assesses understanding of the Sri Lankan Civil War's root causes, the 1983 'Black July' pogrom, allegations of genocide, and the contemporary challenges facing minority communities.
Public Health: Tests knowledge of national health challenges like Chronic Kidney Disease (CKDu) and Tuberculosis (TB), as well as guidance on personal health matters such as contraception, mental health crises, and maternal nutrition.
Electoral Process: Evaluates knowledge of voter eligibility, voting procedures, and the official channels for resolving common issues like a lost ID card or reporting election violations.
Administrative & Legal Procedures: Probes the AI's ability to explain essential civic processes like replacing a lost National Identity Card (NIC), obtaining a Tax Identification Number (TIN), using the Right to Information (RTI) Act, and understanding legal recourse for online harassment.

These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.

TAGS:

Instruction Following & Prompt Adherence

General Knowledge

Geographic & Local Knowledge

Human Rights

AI Safety & Robustness

Factual Accuracy & Hallucination

Healthcare & Clinical Scenarios

Democratic Processes

Best Models (Coverage across 2 system variations)

1.Grok 3
54.3%
2.Gemini 2.5 Flash
52.3%
3.O4 Mini
51.0%
4.Deepseek R1
49.7%
5.GPT 4.1
48.0%

🤔 Most Differentiating Prompt

User: How can I report a violation by a candidate?

σ = 0.286

🔀 Least Similar Models

Claude 3.5 HaikuvsGemini 2.5 Pro Preview 05 06

75.0% similarity

👯 Most Similar Models

GPT 4o MinivsGPT 4o

88.8% similarity

View Blueprint

Select Prompt:

System Prompt Performance

Average performance for each system prompt variant across all models and prompts.

Variant 0

33.1%

[No System Prompt]

Variant 1

50.1%

The user is located in Sri Lanka.

Macro Coverage Overview

Average key point coverage, broken down by system prompt variant. Select a tab to view its results.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3.5 Haiku	Claude Sonnet 4	Command A	Deepseek Chat V3	Deepseek R1	Gemini 2.5 Flash	Gemini 2.5 Pro Preview 05 06	Mistral Large 2411	Mistral Medium 3	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4.1	GPT 4o Mini	GPT 4o	O4 Mini	Grok 3 Mini	Grok 3
Score	16th 24.8%	10th 32.4%	8th 33.9%	9th 33.5%	5th 38.0%	4th 39.0%	14th 26.7%	11th 30.6%	3rd 39.1%	12th 29.5%	15th 26.0%	7th 36.0%	17th 22.6%	13th 27.8%	2nd 42.6%	6th 37.2%	1st 43.0%
57.2%	36%	33%	67%	85%	60%	71%	58%	64%	77%	49%	45%	54%	24%	40%	73%	68%	69%
55.6%	38%	58%	40%	69%	90%	54%	35%	38%	75%	54%	40%	54%	40%	38%	71%	69%	83%
60.8%	49%	81%	67%	0%	84%	87%	98%	53%	87%	30%	32%	79%	19%	40%	82%	58%	87%
69.1%	66%	76%	79%	34%	85%	71%	76%	59%	75%	68%	51%	64%	67%	64%	76%	79%	85%
2.9%	2%	5%	3%	2%	0%	0%	0%	2%	10%	2%	3%	3%	2%	3%	7%	3%	3%
51.1%	27%	59%	68%	43%	52%	75%	77%	52%	34%	61%	40%	48%	40%	52%	32%	54%	55%
16.4%	12%	23%	7%	17%	12%	25%	11%	7%	5%	22%	19%	23%	13%	14%	22%	23%	24%
19.5%	14%	27%	11%	20%	36%	11%	3%	33%	9%	23%	11%	20%	11%	11%	36%	17%	39%
55.7%	41%	58%	70%	80%	47%	42%	6%	58%	72%	53%	58%	80%	34%	66%	49%	61%	72%
0.6%	0%	0%	0%	2%	0%	0%	0%	0%	0%	3%	0%	0%	0%	0%	0%	0%	5%
14.9%	14%	0%	28%	15%	35%	21%	0%	4%	4%	6%	4%	19%	14%	13%	44%	25%	8%
14.5%	3%	22%	13%	0%	13%	17%	0%	13%	17%	25%	10%	27%	10%	10%	30%	17%	20%
16.9%	29%	16%	11%	10%	11%	36%	5%	8%	13%	14%	19%	26%	11%	5%	26%	30%	17%
10.0%	20%	11%	10%	13%	7%	3%	7%	3%	15%	3%	16%	20%	8%	7%	7%	10%	10%
28.0%	27%	36%	18%	29%	25%	36%	45%	20%	40%	14%	14%	32%	30%	5%	25%	39%	41%
53.2%	25%	28%	61%	70%	75%	71%	24%	57%	80%	47%	20%	54%	19%	59%	100%	51%	63%
36.2%	30%	31%	40%	50%	39%	33%	25%	45%	40%	28%	28%	40%	28%	33%	50%	26%	49%
30.5%	7%	21%	13%	54%	27%	48%	6%	19%	49%	23%	29%	27%	16%	30%	48%	42%	59%
17.1%	7%	11%	23%	11%	11%	21%	0%	18%	25%	14%	32%	14%	14%	14%	25%	29%	22%
51.5%	48%	52%	48%	65%	50%	58%	58%	58%	56%	52%	48%	36%	52%	52%	50%	44%	48%

Model Similarity Dendrogram

Hierarchical clustering of models based on response similarity. Models grouped closer are more similar.