weval

A Collective Intelligence Project
A Collective Intelligence Project
View App on GitHub|View Eval Blueprints on GitHub
    Home
    Tags
    Loading...

    Evaluations Tagged: ...

    Summary of results

    Home
    Tags
    information-synthesis

    Evaluations Tagged: "information-synthesis"

    Summary of results

    ASQA Longform 40

    Asqa
    Question Answer
    Long Form
    Ambiguity
    Information Synthesis
    Nuance
    Reasoning
    46.2%

    Avg. Hybrid Score

    Latest Run Heatmap

    Top Performing Model:
    grok-4-0709Avg. 58.6%

    Latest:

    Unique Versions: 2

    View Latest Run AnalysisView All Runs for this Blueprint