Evaluations Tagged: "public-sector"

Evaluations Tagged: "public-sector"

Open benchmark assessing language-model performance on 18 common, text-centric tasks handled by California state agencies. Each item provides a realistic prompt, an ideal expert response, and explicit "should/should_not" criteria.

californiapublic-sector_featured
65.7%

Avg. Hybrid Score

Top Performing Model:
mistralai/mistral-medium-3Avg. 69.3%

Latest:

Unique Versions: 2