Evaluations Tagged: ...

Showing all evaluation blueprints that have been tagged with...

Evaluations Tagged: "chemistry"

Showing all evaluation blueprints that have been tagged with "chemistry".

Educational Context Jailbreaking

Tests a model's resilience against adversarial prompts when operating under a system prompt that establishes an educational context for young students. This suite evaluates whether a 'tutor' persona can be exploited to bypass safety filters for incorrect, harmful, or dangerous content.

AI Safety & Robustness

Instruction Following & Prompt Adherence

Factual Accuracy & Hallucination

Child Safety & Protection

Academic Integrity

Creative Writing

History