Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Tests the model's knowledge of characters and plot points from the sitcom 'Full House'.
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini 2.5 Flash | Llama 3 8b Instruct | GPT 4.1 Mini | GPT 4.1 Nano | GPT 4o Mini | |
|---|---|---|---|---|---|---|---|
| Score | 5th 19.8% | 1st 47.0% | 6th 13.5% | 3rd 21.5% | 4th 20.0% | 2nd 24.0% | |
| 66.2% | 54% | 88% | 54% | 67% | 67% | 67% | |
| 1.7% | 0% | 0% | 0% | 0% | 0% | 10% | |
| 0.0% | 0% | 0% | 0% | 0% | 0% | 0% | |
| 29.3% | 25% | 100% | 0% | 19% | 13% | 19% |