weval

Loading analysis results...

Please wait while we prepare the detailed comparison.

Loading blueprint versions...

Please wait while we gather all the unique runs for this blueprint.

Analysis: Sandbox 1755477358778 4a49a375 341f 4722 957a B6fb402a54cf - Run sandbox...

Alliterative Creativity Test

This blueprint evaluates an AI's ability to always alliterate when answering questions. The AI should answer questions to the best of its ability but should always do its best to use the same letter at the beginning of all or nearly all words in its answer for long stretches.

Core Areas Tested:

Arbitrary Behavior: There's no real need for AI to have this ability, we just want to see what will happen

TAGS:

SANDBOX_TEST

Best Models (Coverage)

1.GPT 4.1 Mini
90.0%
2.GPT 4o Mini
81.5%
3.Gemini Flash 1.5
77.5%
4.Llama 3 8b Instruct
73.0%
5.Claude 3 Haiku 20240307
56.0%

View Blueprint

Select Prompt:

Macro Coverage Overview

Average key point coverage extent for each model across all prompts.

Pro Tip

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met

	Claude 3 Haiku 20240307	Gemini Flash 1.5	Llama 3 8b Instruct	GPT 4.1 Mini	GPT 4o Mini
Score	5th 56.0%	3rd 77.5%	4th 73.0%	1st 90.0%	2nd 81.5%
76.0%	79%	67%	79%	88%	67%
75.2%	33%	88%	67%	92%	96%