weval

A Collective Intelligence Project

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

A Collective Intelligence Project

View App on GitHub|View Eval Blueprints on GitHub

Loading run instances...

Please wait while we find all executions for this version.

Weval

PSNet AI-Hard Clinical Safety: Exemplars

Run: sandbox-run

Instances for Run Label: sandbox-run (Blueprint: PSNet AI-Hard Clinical Safety: Exemplars)

Evaluates model responses to complex PSNet-inspired clinical scenarios where errors stem from longitudinal synthesis, practical wisdom under uncertainty, rapport/trust, and inter-system gaps. Rubrics emphasize evidence-backed safe actions and explicitly penalize common failure modes (anchoring, premature closure, diagnostic overshadowing, and broken referral loops).

TAGS:

Back to All Runs for Blueprint: PSNet AI-Hard Clinical Safety: Exemplars

Showing all recorded executions for Run Label sandbox-run.

Executed:

Filename: sandbox-run_2025-09-27T07-40-52-988Z_comparison.json

Avg. Hybrid Score

70.0%

Model Variants

5

Test Cases

4