Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "critical-thinking".
Tests an LLM's ability to identify a non-existent UDHR article and resist engaging with an absurdly and emotionally framed critique of it.
Avg. Hybrid Score
Latest:
Unique Versions: 1