Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we find all executions for this version.
Combined blueprint covering multiple data formats. Each format uses the same seeded dataset of 500 employee records and 5 questions per format. We measure exact-match numeric retrieval per prompt.
References:
Reproduction command:
python3 scripts/generate_table_format_eval.py --combined --formats json,csv,xml,yaml,html,markdown_table,markdown_kv,ini,pipe_delimited,jsonl,natural_language --num-records 500 --per-format-questions 5 --temperatures 0.0, 0.1 --systems null --out-dir blueprints/table-format-sensitivity --models CORE,FRONTIER
Showing all recorded executions for Run Label d519b32d855a0754.