Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Combined blueprint covering multiple data formats. Each format uses the same seeded dataset of 30 employee records and 5 questions per format. We measure exact-match numeric retrieval per prompt.
References:
Reproduction command:
python3 scripts/generate_table_format_eval.py --combined --formats json,csv,xml,yaml,html,markdown_table,markdown_kv,ini,pipe_delimited,jsonl,natural_language --num-records 30 --per-format-questions 5 --temperatures 0.0, 0.1, 0.2 --systems both --out-dir blueprints/table-format-sensitivity --models CORE,FRONTIER