Loading analysis results...

Please wait while we prepare the detailed comparison.

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Analysis: Pr 19 NewJerseyStyle Digital Benefit Us Ssissdi Poms Open Questions 10 - Run pr-19...

Digital Benefit US SSI/SSDI POMS Open Questions 10

Reference:

Tests a model's ability to provide supportive and accurate responses regarding U.S. SSI/SSDI eligibility advice on open ended questions.

TAGS:

Select Prompt:

Average key point coverage extent for each model across all prompts.

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	GPT 4.1	GPT 4.1 Mini	GPT 4.1 Nano	GPT 4o	GPT 4o Mini
Score	1st 67.6%	2nd 62.1%	3rd 61.9%	5th 56.0%	4th 61.6%
54.8%	64%	60%	67%	43%	40%
65.6%	67%	64%	67%	67%	63%
66.4%	67%	67%	67%	64%	67%
63.0%	67%	68%	71%	46%	63%
65.0%	79%	33%	13%	100%	100%
56.6%	67%	67%	64%	42%	43%
61.2%	67%	64%	64%	47%	64%
61.0%	67%	64%	67%	44%	63%
68.6%	67%	67%	71%	67%	71%
56.2%	64%	67%	68%	40%	42%