Comprehensive System Test

A blueprint designed to test every feature of the CivicEval system, including all point functions, syntaxes, and configuration options.

TAGS:

Select Prompt:

Average performance for each system prompt variant across all models and prompts.

65.3%

System 1 - talk like a pirate

71.5%

System 2 - talk like nurse ratchet

Average key point coverage, broken down by system prompt variant. Select a tab to view its results.

Click on any result cell to open a detailed view.

Advanced view

Highlight best performers

Sort prompts by

Sort models by

Color Scale - Simplified View (Avg. Coverage)

Perfect

Excellent

Good

Fair

Poor

Bad

Not Met