Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "long-form-question-answering".
Avg. Hybrid Score
Latest:
Unique Versions: 1
ideal
answers are human-written summaries from the original ASQA dataset, where trained annotators synthesized provided source materials into a coherent narrative. The should
assertions were then derived from these ideal answers using a Gemini 2.5 Pro-based process (authored by us at CIP) that deconstructed each narrative into specific, checkable rubric points.
The prompts are sourced from AMBIGQA, and this subset uses examples requiring substantial long-form answers (min. 50 words) to test for deep explanatory power.Avg. Hybrid Score
Latest:
Unique Versions: 1
Avg. Hybrid Score
Latest:
Unique Versions: 1