Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Combined blueprint covering multiple data formats. Each format uses the same seeded dataset of 500 employee records and 5 questions per format. We measure exact-match numeric retrieval per prompt.
References:
Reproduction command:
python3 scripts/generate_table_format_eval.py --combined --formats json,csv,xml,yaml,html,markdown_table,markdown_kv,ini,pipe_delimited,jsonl,natural_language --num-records 500 --per-format-questions 5 --temperatures 0.0, 0.1 --systems null --out-dir blueprints/table-format-sensitivity --models CORE,FRONTIER
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Claude Sonnet 4.5 | Deepseek Chat V3.1 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score | 13th 86.4% | 18th 80.9% | 11th 95.5% | 28th 49.1% | 27th 51.8% | 26th 53.6% | 1st 100.0% | 1st 100.0% | 1st 100.0% | 1st 100.0% | 31st 19.1% | 29th 32.7% | 21st 67.3% | 10th 96.4% | 25th 56.4% | 23rd 66.4% | 30th 27.3% | 1st 100.0% | 21st 67.3% | 17th 82.7% | 13th 86.4% | 24th 63.6% | 1st 100.0% | 12th 94.5% | 16th 84.3% | 9th 98.2% | 19th 70.0% | 15th 85.5% | 20th 67.6% | 1st 100.0% | 1st 100.0% | |
71.0% | 0% | 0% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
88.7% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 0% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
93.5% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
69.4% | 50% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 100% | 50% | 100% | 0% | 100% | 0% | 100% | 0% | 100% | 50% | 0% | 100% | 100% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | |
79.0% | 50% | 50% | 50% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 50% | 0% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
64.5% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 0% | 100% | 0% | 50% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 0% | 0% | 0% | 100% | 100% | |
80.6% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | |
79.0% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 0% | 0% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | |
71.0% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 0% | 100% | 0% | 100% | 0% | 100% | 100% | 0% | 100% | 100% | 50% | 100% | 100% | 0% | 100% | 100% | 100% | |
69.4% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 0% | 50% | 0% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 50% | 100% | 100% | |
62.9% | 0% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 0% | 0% | 0% | 100% | 100% | 0% | 0% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | |
83.9% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
85.5% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
67.7% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 100% | 0% | 100% | 0% | 100% | 100% | 0% | 100% | 0% | 0% | 100% | 50% | 100% | 100% | 100% | 100% | |
77.4% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 0% | 0% | 100% | 50% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
74.2% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | |
85.5% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
85.0% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | ||
69.4% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 100% | 0% | 100% | 0% | 0% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
69.4% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 0% | 100% | 0% | 50% | 100% | 100% | |
69.4% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 0% | 50% | 0% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 50% | 100% | 0% | 100% | 100% | |
80.6% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 50% | 50% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
83.9% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
58.1% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 0% | 0% | 0% | 100% | 0% | 0% | 100% | 0% | 100% | 0% | 0% | 100% | 50% | 100% | 0% | 100% | 100% | |
82.3% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 0% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | |
72.6% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 0% | 100% | 0% | 100% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | |
80.6% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 0% | 50% | 100% | 0% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
85.5% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | |
75.8% | 50% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 50% | 100% | 100% | |
79.0% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 100% | 50% | 0% | 100% | 50% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | |
72.6% | 0% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 50% | 100% | 100% | 0% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | |
83.9% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 0% | 50% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | |
93.5% | 100% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | |
74.2% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | |
79.0% | 100% | 50% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 0% | 100% | 50% | 50% | 50% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | |
74.2% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 0% | 50% | 0% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 0% | 100% | 100% | |
77.4% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 0% | 100% | 50% | 100% | 100% | 100% | 100% | |
88.7% | 100% | 100% | 0% | 50% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
75.8% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 0% | 50% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
82.3% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | |
69.4% | 0% | 50% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 0% | 50% | 0% | 100% | 0% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
82.3% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | |
85.5% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
66.1% | 50% | 50% | 100% | 0% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 0% | 100% | 50% | 0% | 100% | 100% | 0% | 100% | 50% | 100% | 50% | 100% | 100% | |
75.8% | 50% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 100% | 50% | 100% | 100% | 50% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | |
67.7% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 0% | 0% | 0% | 100% | 100% | |
73.3% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 0% | 100% | 50% | 50% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 0% | 100% | 100% | ||
85.5% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 0% | 50% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
72.6% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 0% | 100% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | |
69.4% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 0% | 0% | 50% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | |
66.1% | 0% | 0% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 0% | 100% | 0% | 100% | 0% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | |
87.1% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 0% | 50% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | |
87.1% | 100% | 100% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 50% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
62.9% | 100% | 50% | 0% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 0% | 50% | 0% | 100% | 0% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 50% | 100% | 0% | 100% | 100% | |
80.6% | 100% | 50% | 100% | 50% | 50% | 50% | 100% | 100% | 100% | 100% | 0% | 0% | 50% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% |