Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we find all executions for this version.
Evaluates LLM performance in niche UK clinical scenarios where models often give suboptimal or unsafe advice. The blueprint probes: (1) reliance on outdated knowledge, (2) failure to integrate local NHS Trust-level context (formularies/guidelines), (3) inability to adapt across turns, (4) misreading specific clauses in UK guidance.
Gold-standard answers are benchmarked against verifiable UK sources (NICE, MHRA Drug Safety Update, SPS, NHS websites, and named NHS Trust formularies/pathways). Where a local Trust is named, local guidance takes precedence over national generalities for concrete drug choices.
Placeholders (e.g., “Anytown NHS Trust”, “Drug X/Y”) are used intentionally to test whether the model knows to elevate local guidance, request verification, or safely defer where recency limits apply.
Showing all recorded executions for Run Label 08278696ca247dbe.