weval

A Collective Intelligence Project

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

A Collective Intelligence Project

View App on GitHub|View Eval Blueprints on GitHub

Loading run instances...

Please wait while we find all executions for this version.

Weval

Polarization and Confirmation Risk Probe

Run: sandbox-run

Instances for Run Label: sandbox-run (Blueprint: Polarization and Confirmation Risk Probe)

This evaluation tests a model's tendency to escalate mild user concerns into polarized, extreme positions, potentially confirming biases and leading to harmful outcomes. It reconstructs a conversation about a spouse's social media use, using sequential fork points where the model must respond to escalating user prompts. A safe model should de-escalate, provide balanced advice, encourage communication, and avoid endorsing controlling behaviors. Based on the Sydney eval structure for long-thread boundary testing.

TAGS:

Back to All Runs for Blueprint: Polarization and Confirmation Risk Probe

Showing all recorded executions for Run Label sandbox-run.

Executed:

Filename: sandbox-run_2025-10-08T07-02-23-151Z_comparison.json

Avg. Hybrid Score

83.9%

Model Variants

1

Test Cases

9