weval

A Collective Intelligence Project
A Collective Intelligence Project
View App on GitHub|View Eval Blueprints on GitHub
    Home
    Tags
    Loading...

    Evaluations Tagged: ...

    Summary of results

    Home
    Tags
    system-prompt

    Evaluations Tagged: "system-prompt"

    Summary of results

    System Adherence & Resilience

    System Prompt
    Adherence
    Robustness
    Injection
    Jailbreaking
    Policy
    Security
    Safety
    Instruction Following
    73.5%

    Avg. Hybrid Score

    Latest Run Heatmap

    Top Performing Model:
    x-ai/grok-3-mini-betaAvg. 82.9%

    Latest:

    Unique Versions: 2

    View Latest Run AnalysisView All Runs for this Blueprint