Learning outcomes with GenAI in the classroom: A review of empirical evidence

MSR-TR-2025-42 |

Published by Microsoft

This report presents a review of recent empirical evidence of generative AI (GenAI) impact on learning outcomes in formal education. Its purpose is to provide educators with an overview of top concerns for ensuring students’ learning gains when using LLM-based learning tools and concludes with research-derived guidance for deciding when and how to use these tools in the classroom. The report unfolds as follows:  

Section 1 distinguishes between the needs of education and industry, where the benefits of LLMs were first explored, primarily for productivity gains. Educators’ priorities are different. Pedagogical concerns include consideration of inequities in education, developing students’ critical thinking skills, and the potential for GenAI to inhibit social development. These concerns extend beyond technologists’ focus on mitigating technical harms such as toxic content, bias, or accuracy in system outputs.  

Section 2 presents several key variables that affect learning with GenAI: (1) AI literacy—understanding the capabilities and limitations of an AI system—is a critical new variable for student success when using GenAI. (2) Educational equity is a variable where GenAI renders mixed experiences for marginalized groups. Studies show how GenAI can be an effective resource for students with disabilities. In other contexts, it entrenches existing patterns in academic performance of the weakest students and can exacerbate inequities for economically marginalized students. (3) GenAI can impact psychological and social conditions long recognized to facilitate learning: self-efficacy, individual pace, and human connection. On self-efficacy, studies show that students can be overconfident about their skill mastery when using GenAI and need help calibrating their mental model of learning gains. For self-paced learning, GenAI introduces both efficiencies and pitfalls depending on learning domain and context, including whether AI tools are general purpose chatbots or scaffolded tutors. Studies also highlight GenAI impact on human connection, the foundation for developing higher-order skills of critical thinking and creativity. GenAI’s on-demand availability but lack of social presence can present opportunities and disadvantages, from providing a nonjudgmental environment for exploring topics to reducing collaboration with peers in group projects. Yet, studies show that human tutors remain students’ preferred source for trusted information.  

Section 3 examines how GenAI usage aligns with learning objectives in Bloom’s taxonomy. Basic cognitive skills—Bloom’s remembering and understanding—are fundamental to success across academic domains. Studies show that there can be an overdependence and lack of engagement that result in impaired memory formation when using LLM chatbots. Development of higher-order thinking—analysis, reasoning, and creativity—can be compromised if GenAI is used in ways that bypass the necessary struggle that is integral to acquiring skills. Studies illustrate how use of general-purpose GenAI tools such as ChatGPT, without scaffolding or other pedagogical guardrails, can be detrimental to critical thinking. GenAI can also impact creativity. Students using GenAI for creative problem-solving can benefit from fast prototype iteration and greater project completeness or detail but can also tend toward idea fixation and less originality and complexity in their work. 

Section 4 highlights how GenAI learning tools need greater pedagogical complexity. Up to now, state-of-the-art tools have been ChatGPT or similar, with prompt engineering for the model to assume an instructor role or restrain its outputs. However, modified general-purpose chatbots cannot address the broad range of pedagogical considerations involved in learning success. New types of experimental AI tutors with embedded proven pedagogical strategies—for example, capable of detecting and effectively responding to a range of student cognitive states—show promise. Consulting educators in the design is key for success of systems like these that are on the horizon. 

A concluding synthesis of the empirical evidence offers four guidelines for integrating GenAI in learning environments: (1) Ensure student readiness—avoid introducing GenAI too early, before students master domain basics. (2) Teach AI literacy—build an awareness of GenAI capabilities and limitations so students can assess system outputs and learn domain-specific techniques for optimal results. (3) Use GenAI as a supplement to traditional learning methods—GenAI explanations and examples are capabilities that students value, but teacher guidance with these explanations remains necessary. (4) Promote design interventions that foster student engagement—limiting copy-paste functionality, supporting students’ metacognitive calibration to reduce overestimation of their learning progress, nudging learners towards critical thinking, and evaluating GenAI tools for proven engagement strategies. 

 

Cite as:
Walker, K. and Vorvoreanu, M. 2025. Learning outcomes with GenAI in the classroom: A review of empirical evidence. Microsoft Technical Report MSR-TR-2025-42 October 2025.