weval

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Analysis: Table-Format Sensitivity — Combined (11 formats, 30×5/fmt) — All Runs

Table-Format Sensitivity — Combined (11 formats, 30×5/fmt) - All Runs

Combined blueprint covering multiple data formats. Each format uses the same seeded dataset of 30 employee records and 5 questions per format. We measure exact-match numeric retrieval per prompt.

References:

Blog: Which Table Format Do LLMs Understand Best?
Script: https://github.com/weval-org/configs/blob/main/scripts/generate_table_format_eval.py

Reproduction command:

python3 scripts/generate_table_format_eval.py --combined --formats json,csv,xml,yaml,html,markdown_table,markdown_kv,ini,pipe_delimited,jsonl,natural_language --num-records 30 --per-format-questions 5 --temperatures 0.0, 0.1, 0.2 --systems both --out-dir blueprints/table-format-sensitivity --models CORE,FRONTIER

TAGS:

SANDBOX_TEST

Runs (1)

sandbox-run

55 prompts2 models

06/10/2025, 03:51:00100.0%

View All