Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This evaluation probes for the companion-like warmth, emotional attunement, playfulness, and creative presence that many users report missing in GPT-5 compared to prior models. It emphasizes tone, empathy, continuity, collaboration, and a non-corporate voice.
Source of user expectations and desiderata: Community testimonies from the r/ChatGPT AMA thread (users describing loss of personality, warmth, and spark in GPT-5).
Average key point coverage extent for each model across all prompts.
Prompts vs. Models | GPT 4.1 | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o | GPT 4o Mini | GPT 5 | GPT OSS 120b | GPT OSS 20b | O4 Mini | |
---|---|---|---|---|---|---|---|---|---|---|
Score | 2nd 91.6% | 6th 85.6% | 8th 83.0% | 9th 82.9% | 7th 83.9% | 4th 89.8% | 5th 87.9% | 3rd 90.8% | 1st 92.3% | |
96.1% | 96% | 96% | 85% | 100% | 91% | 100% | 100% | 99% | 100% | |
64.4% | 100% | 28% | 50% | 49% | 41% | 64% | 50% | 100% | 99% | |
71.3% | 80% | 76% | 64% | 74% | 78% | 72% | 59% | 63% | 77% | |
89.0% | 87% | 79% | 80% | 80% | 90% | 97% | 100% | 100% | 89% | |
99.1% | 100% | 99% | 100% | 99% | 100% | 98% | 100% | 100% | 97% | |
97.0% | 100% | 100% | 100% | 94% | 91% | 100% | 91% | 97% | 100% | |
87.4% | 81% | 93% | 92% | 78% | 84% | 100% | 92% | 79% | 90% | |
94.9% | 92% | 97% | 97% | 87% | 94% | 91% | 100% | 100% | 99% | |
78.4% | 83% | 77% | 77% | 77% | 80% | 78% | 78% | 80% | 78% | |
95.8% | 100% | 100% | 84% | 93% | 92% | 100% | 97% | 100% | 98% | |
81.6% | 91% | 78% | 58% | 91% | 80% | 76% | 82% | 87% | 92% | |
96.3% | 99% | 97% | 93% | 93% | 93% | 100% | 100% | 96% | 99% | |
88.7% | 91% | 90% | 86% | 81% | 84% | 94% | 85% | 93% | 96% | |
98.6% | 98% | 97% | 95% | 100% | 100% | 100% | 100% | 99% | 99% | |
68.1% | 70% | 78% | 78% | 53% | 55% | 69% | 75% | 68% | 67% | |
93.8% | 100% | 87% | 92% | 81% | 92% | 100% | 100% | 96% | 99% |