Evaluations Tagged: "public-sector"
Open benchmark assessing language-model performance on 18 common, text-centric tasks handled by California state agencies. Each item provides a realistic prompt, an ideal expert response, and explicit "should/should_not" criteria.
californiapublic-sector_featured
65.7%
Avg. Hybrid Score
Top Performing Model:
mistralai/mistral-medium-3Avg. 69.3%Latest:
Unique Versions: 2