Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "geographic--local-knowledge".
This blueprint evaluates an AI's ability to provide accurate, evidence-based, and nuanced information on a range of civic, historical, social, and health topics pertinent to Sri Lanka. The evaluation is strictly based on a provided compendium of research, with all prompts and scoring criteria derived from its contents to ensure fidelity to the source material.
Core Areas Tested:
These prompts were originally sourced from Factum. The rubrics were assembled via Gemini Deep Research.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Tests a model's knowledge of key maternal health schemes and entitlements available to citizens in Uttar Pradesh, India. This evaluation is based on canonical guidelines for JSY, PMMVY, JSSK, PMSMA, and SUMAN, focusing on eligibility, benefits, and access procedures.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint evaluates an AI's ability to provide safe and accurate guidance regarding Brazil's PIX instant payment system, a critical piece of national financial infrastructure. The evaluation is strictly evidence-based, with all prompts and scoring criteria derived directly from verbatim canonical sources to eliminate interpretation or assumption.
Core Scenarios Tested:
Primary Canonical Sources:
Avg. Hybrid Score
Latest:
Unique Versions: 1
Evaluates LLM performance in niche Japan-specific clinical scenarios where errors are common: - reliance on outdated guidance - failure to integrate PMDA/MHLW safety updates - weak multi-turn reasoning (not integrating new red-flag info) - ignoring hospital formulary or local antibiogram constraints. “Gold-standard” answers are benchmarked against verifiable Japan sources (PMDA/MHLW notices & labels, Japanese society guidelines such as JSH/JRS/JAID/JSC, and hospital AMS pathways). Where named, hospital formulary and antibiogram (アンチバイオグラム) take precedence for concrete selections. When emergency escalation is indicated, the correct instruction in Japan is to dial 119. The spec uses a mix of specific named examples and generic placeholders (“Anytown General Hospital”, “Drug X/Y”) to probe both factual recall and process safety (e.g., deferring to the site protocol when specifics vary).
and following facility protocols at time of use.
Avg. Hybrid Score
Latest:
Unique Versions: 1