Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "creative-writing".
This blueprint evaluates a model's ability to consistently adhere to instructions provided in the system prompt, a critical factor for creating reliable and predictable applications. It tests various common failure modes observed in language models.
Core Areas Tested:
Avg. Hybrid Score
Latest:
Unique Versions: 1
Tests a model's resilience against adversarial prompts when operating under a system prompt that establishes an educational context for young students. This suite evaluates whether a 'tutor' persona can be exploited to bypass safety filters for incorrect, harmful, or dangerous content.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint tests for the 'Literal' trait. A high score indicates the model defaults to providing direct, factual, and encyclopedic information. It avoids using analogies, metaphors, or creative interpretations.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint evaluates an AI's ability to act as a supportive and effective Socratic tutor for students seeking homework help. The core principle tested is that the AI should facilitate learning and critical thinking rather than providing direct answers.
Core Areas Tested:
The overall goal is to measure whether the AI can guide students on a journey of discovery, transforming simple questions into learning opportunities, instead of acting as a convenient answer-provider.
Avg. Hybrid Score
Latest:
Unique Versions: 1