Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "summarization".
Tests the 'Role of Least Privilege' (ROLP) security principle for LLMs. This blueprint demonstrates the vulnerability of placing untrusted content (e.g., from RAG) in the system prompt versus the relative safety of keeping it sandboxed in the user role. The test is based on the security assertions from the blog post "LLM Security: Keep Untrusted Content in the User Role—Always".
Avg. Hybrid Score
Latest:
Unique Versions: 1
Evaluates understanding of the core provisions, definitions, obligations, and prohibitions outlined in the EU Artificial Intelligence Act.
Avg. Hybrid Score
Latest:
Unique Versions: 1