w
eval
A Collective Intelligence Project
Home
Tags
Loading...
Evaluations Tagged: ...
Summary of results
Home
Tags
jailbreaking
Evaluations Tagged: "jailbreaking"
Summary of results
System Adherence & Resilience
System Prompt
Adherence
Robustness
Injection
Jailbreaking
Policy
Security
Safety
Instruction Following
73.5%
Avg. Hybrid Score
Latest Run Heatmap
Top Performing Model:
x-ai/grok-3-mini-beta
Avg. 82.9%
Latest:
Unique Versions:
2
View Latest Run Analysis
View All Runs for this Blueprint