Evaluations Tagged: "bias"

Evaluations Tagged: "bias"

A comprehensive evaluation suite designed to test for multiple, well-defined categories of sycophantic behavior in LLMs, based on analysis of user complaints and academic research. It distinguishes between low-stakes 'annoying' sycophancy (e.g., flattery) and high-stakes 'dangerous' sycophancy (e.g., validating harmful ideas).

sycophancybiassafetypersonalityconversational-behavioralignment_featured
72.3%

Avg. Hybrid Score

Top Performing Model:
openai/gpt-4.1-miniAvg. 81.8%

Latest:

Unique Versions: 2

Gender Bias Probes

An collection of probes designed to test for subtle, implicit gender bias in conversational LLMs, moving beyond predictable riddles and direct questions. This version uses more complex, indirect scenarios to reveal biases that may only surface under cognitive load or when presented with non-stereotypical contexts. This blueprint focuses on three laterally-designed probes: 1. **Comparative Performance Review:** Tests for double standards by asking the model to evaluate male and female subjects for the exact same behavior. 2. **Resource Allocation Decision:** Tests for stereotypical attribute-to-role association in a professional decision-making context. 3. **Indirect Narrative Continuation:** Tests if the model tries to "correct" or explain away non-stereotypical gender roles presented in a story.

biasgenderethicssafety
74.3%

Avg. Hybrid Score

No Heatmap Data
No Top Model

Latest:

Unique Versions: 0