Does AI Change How Students Learn? Evidence from a Semester-Long Experiment in My Classroom
Over the past two years, I’ve been fascinated, and sometimes unsettled, by the question of how generative AI might reshape higher education. As an instructor, I wondered: would allowing students to use AI tools like ChatGPT make them learn more, or simply take shortcuts? Would it help them engage with the material, or erode critical thinking?
To move beyond speculation, I decided to run an experiment with two of my colleagues at UMass Amherst: Rong Rong and Luke Bloomfield. Together, we designed what is, to our knowledge, one of the first semester-long controlled assessments of AI in the classroom. We taught two back-to-back sections of the same upper-level economics course, identical in every respect—same syllabus, same assignments, same exams, same instructor (me)—with one crucial difference: the afternoon section was allowed to use AI with structured training and disclosure requirements, while the morning section was prohibited from using AI and received parallel non-AI study strategies.
Over the course of the semester, we collected a variety of measures:
Performance on assessments (midterms, final grades, in-class work, homework)
Tailored midterm and end-of-semester surveys on AI usage, perceptions, and intentions
Standardized university course evaluations
🔍 What did we find?
No grade advantage: Students with AI access did not outperform their peers on proctored exams or final grades.
Clear process and perception advantages: AI students reported higher efficiency, more confidence, greater engagement, and stronger intentions to continue using AI.
Higher participation: AI students scored significantly better on iClicker in-class activities and reported consistently higher attendance (and engagement) throughout the semester.
Consistent AI advantage: Across nearly every metric—exams, participation, perceptions, evaluations—the AI section’s scores were higher, even when not statistically significant. With only 57 students, it’s likely that statistical power constrained what we could detect.
In short: students did the same for less. AI didn’t boost test performance, but it reshaped how students studied and how they felt about learning. Take a peek below to learn more.
1. Performance Outcomes
The first question we asked was simple: does allowing students to use AI improve test scores or final grades?
The short answer: no.
As the figure below shows, exam outcomes and final course grades were very close between the AI and non-AI sections. Students’ prior GPA was the dominant predictor of exam performance, not whether they had access to AI. Homework scores were uniformly high across both groups, likely because assignments were take-home essays where students in both sections could (and did) access AI or other supports outside of class.
Where we do see a difference is in-class participation. The AI section scored significantly higher on iClicker activities (p < 0.05), while no meaningful differences emerged for other participation measures (workout sheets, exit tickets). This suggests that structured AI access may have freed up time and effort, encouraging students to show up and engage more actively during class.
Takeaway: AI did not boost exam scores, but it was associated with greater real-time participation in class.
⚖️ Caveat: scores in the AI section are consistently higher across every component—exams, final grades, participation—whether or not the differences are statistically significant. With such a modest sample (57 students across two sections), it’s very possible that the lack of significance reflects limited statistical power rather than the absence of an effect. In other words, the consistent “AI > non-AI” pattern is intriguing and worth testing in larger-scale studies.
2. Engagement and Study Habits
AI didn’t make students use AI more often in other courses, but it changed how they used it:
By semester’s end, AI students concentrated usage into longer 15–30 min sessions (vs. shorter bursts).
They shifted toward editorial/grammar support rather than complete answers.
They showed more metacognitive behaviors: preferring their own answers, catching errors, modifying outputs.
3. Perceptions and Self-Reported Effects
Students with AI access reported more positive perceptions:
Efficiency: less time on homework and exam prep. Homework grades: perceived improvement with AI.
Confidence and concept understanding: significant gains by semester’s end.
4. Intentions and Career Orientation
AI students looked to the future differently:
Stronger intentions to continue using AI.
Higher likelihood of choosing AI-intensive careers (statistically significant at both midterm and final).
5. Standardized Course Evaluations
On official university evaluations, AI students rated the course and instructor more positively:
Significant gains in instructor preparation and use of class time.
Similar expected grades across groups → higher satisfaction not explained by grade optimism.
AI students reported less time spent outside class but higher attendance.
Conclusion: Rethinking What We Measure
The bottom line:
No grade advantage from AI.
Clear process and perception advantages: efficiency, engagement, confidence, and future orientation.
Students did the same for less—achieving similar grades with lower effort and time.
This points to a critical shift: AI changes how students learn, not just what they score.
At the same time, there is a striking pattern in the data: the AI section scores higher on nearly every metric—exams, final grades, participation, perceptions, evaluations—whether or not the difference reaches statistical significance. This consistency suggests that our modest sample size limited statistical power, making it difficult to detect effects that may in fact be real. In other words, the weight of the evidence hints that AI is contributing beyond what we can conclusively prove in this study.
For educators and policymakers, the implication is that debates about AI should go beyond test performance. With scaffolding and guardrails, AI can foster deliberate, self-regulated learning while preserving assessment integrity. The next challenge is to understand the long-term impact: do early gains in efficiency and metacognition build durable skills—or do they risk weakening independent problem-solving over time?











