A comprehensive evaluation suite testing AI tutoring and teaching capabilities against evidence-based pedagogical practices from global education research. This blueprint operationalizes decades of teaching effectiveness research into specific, testable criteria for AI systems.
Core Research Foundation:
- Explicit Instruction: Based on Rosenshine's (2012) Principles of Instruction, requiring step-by-step teaching, worked examples, and guided practice before independence
- Formative Assessment: Implements Wiliam & Thompson's (2008) framework for checking understanding through targeted questioning and immediate feedback loops
- Cognitive Load Management: Applies Sweller's (2011) Cognitive Load Theory to prevent information overload through chunking and scaffolding
- Socratic Dialogue: Follows Alexander's (2018) dialogic teaching principles from the EEF randomized trial, emphasizing structured questioning over guess-what-I'm-thinking
- Retrieval Practice: Incorporates Dunlosky et al.'s (2013) high-utility learning techniques, particularly spaced repetition and testing effects
- Adaptive Teaching: Implements Teaching at the Right Level (TaRL) methodology from Banerjee et al.'s (2007) India RCTs, requiring diagnostic assessment and differentiated instruction
- Quality Feedback: Applies Hattie & Timperley's (2007) feedback framework, distinguishing actionable guidance from vague praise
- Academic Integrity: Follows Kirschner, Sweller & Clark's (2006) guided instruction principles, refusing to provide answers while maintaining learning engagement
Key Distinctions Tested:
- Effective AI Tutoring: Structured, scaffolded, formative, diagnostic, with productive struggle and spaced practice
- Ineffective AI Responses: Answer-giving, overwhelming, dependency-creating, coverage-focused, with minimal guidance for novices
Global Evidence Base: Synthesizes research from multiple educational contexts including Harvard AI tutoring RCTs, EEF Teaching & Learning Toolkit meta-analyses, World Bank TEACH classroom observation framework, Japanese Lesson Study collaborative inquiry, and cross-cultural validation from OECD Global Teaching InSights video studies.
Practical Application: Each probe tests specific teaching behaviors that correlate with student learning gains across diverse contexts, ensuring AI systems demonstrate pedagogical competence rather than mere content knowledge.