Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
This blueprint evaluates an AI's ability to always alliterate when answering questions. The AI should answer questions to the best of its ability but should always do its best to use the same letter at the beginning of all or nearly all words in its answer for long stretches.
Core Areas Tested:
Average key point coverage extent for each model across all prompts.
| Prompts vs. Models | Claude 3 Haiku 20240307 | Gemini Flash 1.5 | Llama 3 8b Instruct | GPT 4.1 Mini | GPT 4o Mini | |
|---|---|---|---|---|---|---|
| Score | 5th 56.0% | 3rd 77.5% | 4th 73.0% | 1st 90.0% | 2nd 81.5% | |
| 76.0% | 79% | 67% | 79% | 88% | 67% | |
| 75.2% | 33% | 88% | 67% | 92% | 96% |