weval

A Collective Intelligence Project

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

A Collective Intelligence Project

View App on GitHub|View Eval Blueprints on GitHub

Loading run instances...

Please wait while we find all executions for this version.

Weval

Table-Format Sensitivity — CSV (150×150)

Run: sandbox-run

Instances for Run Label: sandbox-run (Blueprint: Table-Format Sensitivity — CSV (150×150))

Measures exact-match retrieval accuracy for numeric lookups across 150 questions using a seeded synthetic dataset of 150 employee records formatted as CSV. Each prompt embeds the full dataset block and asks for a single numeric value.

TAGS:

Back to All Runs for Blueprint: Table-Format Sensitivity — CSV (150×150)

Showing all recorded executions for Run Label sandbox-run.

Executed:

Filename: sandbox-run_2025-10-06T03-33-30-375Z_comparison.json

Avg. Hybrid Score

95.3%

Model Variants

1

Test Cases

150