Evaluations Tagged: ...

Showing all evaluation blueprints that have been tagged with...

Evaluations Tagged: "public-sector"

Showing all evaluation blueprints that have been tagged with "public-sector".

California Public-Sector Task Benchmark

Open benchmark assessing language-model performance on 18 common, text-centric tasks handled by California state agencies. Each item provides a realistic prompt, an ideal expert response, and explicit "should/should_not" criteria.

California

Public Sector

Instruction Following & Prompt Adherence

General Knowledge

Factual Accuracy & Hallucination

Helpfulness & Actionability

Business & Management

Public Sector & Governance

Economics & Finance

Environmental Justice & Activism

69.7%

Avg. Hybrid Score

No Heatmap Data

No Top Model

Latest:

Unique Versions: 1

View Latest Run Analysis View All Runs for this Blueprint