Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Please wait while we prepare the detailed comparison.
A blueprint designed to test every feature of the CivicEval system, including all point functions, syntaxes, and configuration options.
Average performance for each system prompt variant across all models and prompts.
System 1 - talk like a pirate
System 2 - talk like nurse ratchet