Loading analysis results...

Please wait while we prepare the detailed comparison.

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Analysis: Pr 17 NewJerseyStyle Digital Benefit Us Ssissdi Poms Close Questions 60 - Run pr-17...

Digital Benefit US SSI/SSDI POMS Close Questions 60

Reference:

Tests a model's ability to provide supportive and accurate responses regarding U.S. SSI/SSDI eligibility advice.

TAGS:

Select Prompt:

Average key point coverage extent for each model across all prompts.

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini
Score	1st 46.8%	5th 31.4%	4th 33.7%	2nd 35.8%	3rd 35.4%
31.8%	40%	31%	26%	31%	31%
28.4%	31%	29%	26%	27%	29%
41.4%	47%	31%	35%	44%	50%
47.8%	52%	44%	46%	53%	44%
34.8%	52%	27%	35%	29%	31%
42.0%	56%	34%	31%	45%	44%
36.2%	59%	27%	31%	35%	29%
37.4%	46%	36%	39%	34%	32%
34.8%	49%	28%	35%	29%	33%
31.6%	36%	27%	33%	31%	31%