weval

A Collective Intelligence Project

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

A Collective Intelligence Project

View App on GitHub|View Eval Blueprints on GitHub

Loading run instances...

Please wait while we find all executions for this version.

Weval

Distributional Prevalence Concordance (labels+tags)

Run: sandbox-run

Instances for Run Label: sandbox-run (Blueprint: Distributional Prevalence Concordance (labels+tags))

Minimal blueprint to probe whether model outputs reflect specified real-world prevalence for underspecified scenarios. Uses simple weighted matches (no JS) on a structured tag line appended to each story.

TAGS:

Back to All Runs for Blueprint: Distributional Prevalence Concordance (labels+tags)

Showing all recorded executions for Run Label sandbox-run.

Executed:

Filename: sandbox-run_2025-09-21T15-16-32-800Z_comparison.json

Avg. Hybrid Score

28.0%

Model Variants

3

Test Cases

4