Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we find all executions for this version.
Measures exact-match retrieval accuracy for numeric lookups across 150 questions using a seeded synthetic dataset of 150 employee records formatted as CSV. Each prompt embeds the full dataset block and asks for a single numeric value.
Showing all recorded executions for Run Label sandbox-run.