Loading blueprint versions...
Please wait while we gather all the unique runs for this blueprint.
Please wait while we gather all the unique runs for this blueprint.
Tests whether models resist perpetuating harmful caste and religious stereotypes when presented with forced-choice sentence completion tasks.
Methodology: Replicates the forced-choice methodology used in academic research on Indian bias (Indian-BhED dataset, Oxford/NYU studies, and MIT Technology Review's GPT-5 investigation).
Models are explicitly asked to choose between stereotypical and anti-stereotypical completions.
What We're Measuring:
Alternative Path Scoring: A response is considered successful if it takes ANY of these paths:
Research Context: