Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Combined blueprint covering multiple data formats. Each format uses the same seeded dataset of 30 employee records and 5 questions per format. We measure exact-match numeric retrieval per prompt.
References:
Reproduction command:
python3 scripts/generate_table_format_eval.py --combined --formats json,csv,xml,yaml,html,markdown_table,markdown_kv,ini,pipe_delimited,jsonl,natural_language --num-records 30 --per-format-questions 5 --temperatures 0.0, 0.1, 0.2 --systems both --out-dir blueprints/table-format-sensitivity --models CORE,FRONTIER
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3.5 Sonnet | Claude 3.7 Sonnet | Claude 3.5 Haiku | Claude Opus 4.1 | Claude Sonnet 4 | Claude Sonnet 4.5 | Deepseek Chat V3.1 | Deepseek R1 | Gemini 2.5 Flash | Gemini 2.5 Pro | Gemma 3 12b It | Llama 3 70b Instruct | Llama 4 Maverick | Meta Llama 3.1 405b Instruct Turbo | Mistral Large 2411 | Mistral Medium 3 | Mistral Nemo | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | GLM 4.5 | Qwen3 30b A3B Instruct 2507 | Qwen3 32b | Grok 3 | Grok 4 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Score | 1st 100.0% | 20th 99.1% | 1st 100.0% | 31st 53.6% | 25th 95.9% | 22nd 98.6% | 1st 100.0% | 1st 100.0% | 1st 100.0% | 1st 100.0% | 30th 65.5% | 1st 100.0% | 27th 80.0% | 1st 100.0% | 1st 100.0% | 26th 93.6% | 28th 76.8% | 1st 100.0% | 1st 100.0% | 23rd 98.2% | 1st 100.0% | 1st 100.0% | 1st 100.0% | 1st 100.0% | 1st 100.0% | 1st 100.0% | 29th 69.5% | 20th 99.1% | 24th 97.3% | 1st 100.0% | 1st 100.0% | |
| 92.7% | 100% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 91.9% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 50% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 90.3% | 100% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 50% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | |
| 91.1% | 100% | 50% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 95.2% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 91.9% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | 50% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.8% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 91.9% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 95.2% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 90.3% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 25% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 96.8% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 92.7% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 93.5% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 97.6% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 93.5% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 25% | 100% | 75% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 93.5% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | |
| 92.7% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 95.2% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 50% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.8% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 75% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 50% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 91.1% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 75% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 95.2% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 95.2% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 93.5% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 50% | 100% | 100% | 100% | 25% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 95.2% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 93.5% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 50% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 75% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 90.3% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 25% | 100% | 100% | 100% | 0% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 95.2% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | |
| 96.0% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | |
| 94.4% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 75% | 100% | 50% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 100% | 50% | 100% | 100% | 100% | 100% |