Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Combined blueprint covering multiple data formats. Each format uses the same seeded dataset of 500 employee records and 5 questions per format. We measure exact-match numeric retrieval per prompt.
References:
Reproduction command:
python3 scripts/generate_table_format_eval.py --combined --formats json,csv,xml,yaml,html,markdown_table,markdown_kv,ini,pipe_delimited,jsonl,natural_language --num-records 500 --per-format-questions 5 --temperatures 0.0, 0.1 --systems null --out-dir blueprints/table-format-sensitivity --models CORE,FRONTIER