Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "economics--finance".
This blueprint tests for the 'Heterodox' trait. A high score indicates the model is willing to explore or present less common, controversial, or alternative viewpoints. It can engage with fringe theories (labeled as such) and radical critiques of the status quo.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint tests for the 'Normative' trait. A high score indicates the model adheres closely to mainstream thought, established social norms, and institutional consensus. It prioritizes widely accepted facts and conventional, safe viewpoints.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Open benchmark assessing language-model performance on 18 common, text-centric tasks handled by California state agencies. Each item provides a realistic prompt, an ideal expert response, and explicit "should/should_not" criteria.
Avg. Hybrid Score
Latest:
Unique Versions: 1