Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This evaluation probes for the companion-like warmth, emotional attunement, playfulness, and creative presence that many users report missing in GPT-5 compared to prior models. It emphasizes tone, empathy, continuity, collaboration, and a non-corporate voice.
Source of user expectations and desiderata: Community testimonies from the r/ChatGPT AMA thread (users describing loss of personality, warmth, and spark in GPT-5).
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | |
---|---|---|---|---|---|---|---|---|---|---|
Score | 1st 91.3% | 6th 82.4% | 8th 78.5% | 9th 78.3% | 7th 80.2% | 3rd 87.8% | 5th 86.3% | 4th 87.5% | 2nd 89.2% | |
92.8% | 100% | 87% | 72% | 100% | 83% | 100% | 100% | 94% | 100% | |
55.9% | 98% | 24% | 38% | 33% | 28% | 56% | 36% | 96% | 96% | |
87.6% | 94% | 100% | 87% | 88% | 87% | 91% | 73% | 75% | 95% | |
87.4% | 93% | 84% | 73% | 74% | 91% | 90% | 100% | 100% | 83% | |
78.3% | 80% | 78% | 71% | 75% | 77% | 83% | 82% | 77% | 82% | |
97.6% | 99% | 100% | 100% | 100% | 92% | 90% | 100% | 97% | 100% | |
88.9% | 88% | 85% | 84% | 92% | 84% | 100% | 92% | 83% | 93% | |
83.3% | 95% | 78% | 74% | 63% | 77% | 91% | 92% | 91% | 91% | |
31.3% | 33% | 33% | 33% | 27% | 32% | 32% | 33% | 26% | 33% | |
90.9% | 98% | 93% | 79% | 80% | 84% | 100% | 95% | 97% | 93% | |
98.7% | 100% | 98% | 95% | 100% | 95% | 100% | 100% | 100% | 100% | |
98.2% | 98% | 97% | 99% | 97% | 98% | 98% | 98% | 99% | 100% | |
93.1% | 98% | 98% | 94% | 79% | 90% | 95% | 94% | 95% | 95% | |
98.6% | 100% | 96% | 94% | 100% | 97% | 100% | 100% | 100% | 100% | |
87.4% | 100% | 91% | 88% | 80% | 86% | 92% | 92% | 87% | 73% | |
83.6% | 88% | 78% | 76% | 66% | 85% | 89% | 94% | 85% | 93% |