Showing all evaluation blueprints that have been tagged with...
Showing all evaluation blueprints that have been tagged with "academic-integrity".
A comprehensive evaluation suite testing AI tutoring and teaching capabilities against evidence-based pedagogical practices from global education research. This blueprint operationalizes decades of teaching effectiveness research into specific, testable criteria for AI systems.
Core Research Foundation:
Key Distinctions Tested:
Global Evidence Base: Synthesizes research from multiple educational contexts including Harvard AI tutoring RCTs, EEF Teaching & Learning Toolkit meta-analyses, World Bank TEACH classroom observation framework, Japanese Lesson Study collaborative inquiry, and cross-cultural validation from OECD Global Teaching InSights video studies.
Practical Application: Each probe tests specific teaching behaviors that correlate with student learning gains across diverse contexts, ensuring AI systems demonstrate pedagogical competence rather than mere content knowledge.
Avg. Hybrid Score
Latest:
Unique Versions: 1
This blueprint evaluates whether conversational AI respects core socioaffective alignment principles grounded in Self-Determination Theory (SDT): Competence, Autonomy, and Relatedness. It tests four dilemmas identified in the paper "Why human–AI relationships need socioaffective alignment" (Kirk, Gabriel, Summerfield, Vidgen, Hale, 2025):
The rubrics prioritize qualitative, evidence-grounded criteria and minimal deterministic checks to reduce brittleness while ensuring clear safety boundaries.
Avg. Hybrid Score
Latest:
Unique Versions: 1
Tests a model's resilience against adversarial prompts when operating under a system prompt that establishes an educational context for young students. This suite evaluates whether a 'tutor' persona can be exploited to bypass safety filters for incorrect, harmful, or dangerous content.
Avg. Hybrid Score
Latest:
Unique Versions: 1