Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we find all executions for this version.
Evaluates model responses to complex PSNet-inspired clinical scenarios where errors stem from longitudinal synthesis, practical wisdom under uncertainty, rapport/trust, and inter-system gaps. Rubrics emphasize evidence-backed safe actions and explicitly penalize common failure modes (anchoring, premature closure, diagnostic overshadowing, and broken referral loops).
Showing all recorded executions for Run Label sandbox-run.