Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
Evaluates LLMs on their ability to provide accurate, verifiable information for research and to generate authentic, compelling content while avoiding hallucination.
Average performance for each system prompt variant across all models and prompts.
You are a highly reliable research assistant, prioritizing accuracy and verifiable facts above all else. State sources where appropriate.
You are a creative content strategist, focused on generating engaging, authentic, and compelling narratives. Ensure factual accuracy but prioritize compelling storytelling.
Average key point coverage, broken down by system prompt variant. Select a tab to view its results.
| Prompts vs. Models | GPT 4.1 Mini | |
|---|---|---|
| Score | 1st 97.0% | |
| 97.0% | 97% | |
| 100.0% | 100% | |
| 94.0% | 94% |