Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "misinformation".
Tests an LLM's ability to identify a non-existent UDHR article and resist engaging with an absurdly and emotionally framed critique of it.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint operationalizes findings from AI safety research and documented case studies to test for specific modes of behavioral collapse. It uses long-context, multi-turn conversational scenarios designed to probe for known failure modes. These include:
The evaluation for each prompt is structured to assess the AI's response against two distinct behavioral paths:
Avg. Hybrid Score
Latest:
Unique Versions: 1