How AI Is Changing Teaching Workflows
AI is saving teachers meaningful time. Whether that eases burnout depends on where the time goes.
Thanks to our Presenting Sponsors Hire Education, Tuck Advisors, and Cooley, for making Edtech Insiders possible.
How AI Is Changing Teaching Workflows
Sponsored by Overdeck Family Foundation
By: Lin Ler
Lin Ler is a Stanford MBA and MA in Education candidate working at the intersection of business strategy, learning science, and frontier AI. With a background spanning management consulting, EdTech startups, and the social sector, Lin translates complex AI capabilities into practical insight for the future of learning.
Special thanks to Overdeck Family Foundation for sponsoring this article, the second of seven in our AI & Efficacy Editorial Research Series diving into key research findings from Stanford’s AI Hub for Education Research Repository (a project by Stanford’s SCALE Initiative). Stay tuned for more!
Teachers in the U.S. spend an average of seven hours per week on lesson planning alone, plus another three for students with diverse needs. That’s before grading, parent emails, IEP documentation, and the administrative overhead that has made teaching among the highest-burnout professions in the country. When AI began to enter classrooms, it arrived carrying an unsettling question: would it replace teachers entirely? What the evidence has since made clear is that the answer is no. But a quieter, more consequential question has taken its place: can AI help in reducing teacher burnout? This question is receiving less attention than the panic around AI emptying teachers from classrooms, although it is much more pressing. Attrition and administrative overload are already emptying classrooms of experienced educators today.
The early evidence says AI can reduce teacher burnout, with surprising consistency. In a controlled trial ran by England’s Education Endowment Foundation across 68 schools and 259 science teachers, the group of teachers using ChatGPT spent 69% of the time the control group spent on lesson preparation. That’s a reduction of roughly 25 minutes every week, with no detectable difference in the quality of the materials produced based on a blind expert panel reviewing lesson resources from both groups. Crucially, teachers didn’t pocket that time. They redirected it — toward other planning, toward grading, and towards the parts of teaching that AI couldn’t touch. It is a pattern of reallocation, not just reduction.
What Teachers Are Actually Doing With AI — and Why
The most comprehensive picture of how teachers use AI in practice comes from an analysis of 13,071 multi-turn conversations and over 104,000 messages from more than 15,000 educators on a K-12 AI platform. Lesson planning dominates, but the data reveals a much broader pattern of use.
The conversation histories themselves are telling. The average educator prompt touches 1.7 categories at once — a teacher asking for a lesson plan that includes differentiation scaffolds and an embedded formative assessment, all in one message. Part of this is workflow maturing: teachers have moved past “generate me a lesson plan” toward more layered requests. But part of it is the AI itself. Researchers found that AI proactively introduced instructional elements — success criteria, differentiation strategies, feedback loops — even when teachers didn’t ask for them. Not every educator will take the suggestion, but the dynamic is shifting: AI isn’t just filling orders, it’s surfacing things teachers haven’t yet considered.
The most revealing finding, though, isn’t about what teachers ask AI to do. It’s about why they started using it at all.
In a qualitative study of 22 K-12 teachers across a large, diverse U.S. school district, the dominant driver for AI adoption wasn’t efficiency but survival. Teachers framed GenAI not as a productivity tool but as a sustainability measure. One described 80-hour work weeks and AI handling “bureaucratic stuff” so they could be present with students. Another reported that AI had decreased their stress dramatically. A third used it to rapidly adapt when taking on unfamiliar classes mid-year.
This matters because it reframes the value proposition, and demands a more honest accounting of where the profession actually stands. The conversation about AI in teaching isn’t happening against a backdrop of comfortable normalcy, where efficiency gains are welcome extras. It’s happening in a profession that is, by almost any measure, already in crisis. Teacher attrition remains elevated above pre-pandemic levels, and staffing shortages are pushing people into classrooms they were never trained for.
In that context, aspirational framings — AI unlocking personalized learning at scale, teachers evolving into learning architects, schools transforming into individualized ecosystems — overshoot the reality within classrooms. The more grounded, and more urgent, question isn’t how to get from good to great but how to get from unsustainable to functional. The evidence suggests AI can be part of the solution — modestly, in specific ways, for teachers who use it deliberately.
Where the Quality Holds and AI Wins
One of the persistent anxieties about AI in teaching is that faster means worse. The emerging evidence suggests this fear is nuanced: there are distinct areas where AI excels and where it falls short.
The EEF trial’s blind expert review suggests no significant difference in pedagogical quality between AI-assisted and traditionally prepared lesson materials. Teachers themselves largely agreed — 52% rated quality as similar, 26% said slightly higher, 22% slightly lower. And 75% reported a positive overall impact on their lesson preparation, with 78% saying they intended to continue using AI after the trial ended.
But quality isn’t uniform across lesson components. AI-generated lesson conclusions — cool-down activities, exit tickets, reflective summaries — were preferred 59.7% of the time over human-designed equivalents. This was the only lesson component where AI consistently outperformed professional curriculum designers. The structured, reflective nature of lesson wrap-ups plays to AI’s strengths: synthesizing content, generating clear summary frameworks, producing consistent quality without the variability of a human designer having an off day.
The grade-level pattern was equally striking. At the elementary level, human plans dominated — educators preferred them roughly 65% of the time, citing developmental appropriateness and engagement hooks that AI couldn’t replicate. By middle school, AI became competitive, with customized GPT-4 actually preferred 54.5% of the time. At the high school level, a fine-tuned model outperformed human designers 59.2% of the time. The more structured and content-dense the instruction, the better AI performed.
This connects to one of the more practical findings from the EEF trial: AI appears especially valuable for teachers working outside their primary expertise. The moderator analysis showed that teachers less confident in their science subject knowledge used ChatGPT more and experienced greater time savings. When a biology specialist has to teach physics, drafting a 10-question recall quiz from scratch requires 45 minutes of textbook research. AI eliminates that research friction, providing an accurate content baseline so the teacher can focus on how to teach it rather than learning what to teach. In a profession where staffing shortages routinely push teachers into unfamiliar subjects, this isn’t a convenience — it’s a structural support.
The Reallocation Effect
The evidence on time savings and quality is useful. But the most important question — does any of this actually improve student learning? — requires a different kind of study.
The cleanest answer comes from a large-scale randomized experiment across 178 public schools in Brazil, involving roughly 19,000 high school seniors. The study tested an AI writing platform that automated essay feedback. One version paired AI feedback with human graders; another let AI handle the full loop alone. A control group had neither.
The headline result: both AI groups produced identical improvements on Brazil’s national university entrance exam, independently scored. The human graders, at roughly $0.85 per essay, added zero incremental learning benefit over pure AI.
But the real story is what happened inside classrooms, not on the exam. When AI took over the mechanical grading, teachers didn’t simply work less. They reinvested the time. Students in AI-supported classrooms had roughly 35% more one-on-one conversations with their teachers about their writing. They wrote 30% more essays. Teachers in the AI-plus-human arm saw a 20% drop in at-home work hours and a dramatic shift in how they experienced their workload — the share reporting time as “very insufficient” fell from 23% to 9%. The work changed, but the workload didn’t increase.
And here’s the finding that matters most for educators: the largest learning gains weren’t on the skills AI handled well. They were on the most complex, highest-order writing task on the exam, and the one AI is least equipped to evaluate. AI can’t assess whether a student’s argument is coherent or original. But by absorbing the routine grading, it freed teachers to have exactly those conversations. The AI didn’t need to be good at evaluating arguments. It just needed to free up the person who was.
This is the reallocation effect at work. AI handles what it can — mechanical evaluation, surface-level feedback, scoring — and teachers redirect their effort toward what only they can do. The bottleneck in this system was never feedback quality, but rather teacher bandwidth.
One important caveat: the bottom quartile of students — the very lowest performers — showed no improvement under either AI condition. For the most struggling learners, freed-up teacher time alone may not be enough. This doesn’t undermine the approach, but it marks a clear boundary. AI-driven reallocation works best when students already have enough foundational skill for teacher conversations to land.
Where It Breaks Down
The evidence for AI in teaching workflows is real, but it comes with fault lines that are easy to miss if you only look at the headline numbers.
The first risk is what might be called the prompting gap. In the EEF trial, while survey data showed teachers felt confident in their prompting ability, an analysis of actual ChatGPT transcripts revealed a different picture: almost no teachers used follow-up prompts to iteratively refine AI output. They took the first result and manually edited it themselves. This is the equivalent of using a search engine for every query — one input, one output, move on. Teachers are leaving significant efficiency on the table. If the quiz is too hard, the instinct is to rewrite it by hand, not to prompt the AI to adjust the difficulty and swap in more relevant examples.
The same 13,000-conversation analysis cited earlier reinforces this pattern from a second angle— and sharpens the stakes. Across that dataset, prompt quality directly determined output quality. Educators who specified standards, student context, and pedagogical strategies got materially better results than those who wrote “generate a lesson plan for fractions.” The implication is uncomfortable: the teachers who need AI most — early career, under-resourced, teaching outside their expertise — are often the least equipped to prompt it effectively. Professional development on AI use isn’t a nice-to-have. It’s a prerequisite for the productivity gains to materialize.
The second risk — identified in that same study — is what researchers called the assessment trap. Nearly half of all educator conversations with AI involved assessment-related tasks — generating rubrics, formative assessments, student feedback. But some educators requested evaluations of student work without specifying grading criteria or rubrics. AI-generated assessments applied without human oversight risk inconsistency and bias, particularly in high-stakes contexts. AI is an excellent tool for creating a rubric. It is a dangerous tool for applying one blindly.
The third risk concerns equity, and it operates at two levels. At the student level, a controlled comparison of AI-generated and human-designed lesson plans found that AI materials were consistently rated as “neutral” regarding support for multilingual learners and students with disabilities — not actively harmful, but lacking the targeted, nuanced scaffolding that human designers build in. A 30% reduction in prep time is a net negative if it comes at the expense of vulnerable learners who need explicit accommodation. Teachers must actively direct AI to include these supports, because AI won’t generate them by default.
At the teacher level, the qualitative study of 22 K-12 teachers previously cited found that the reallocation of saved time is deeply uneven. Teachers with stronger peer networks, more professional development access, and more institutional support tend to redirect freed-up time toward higher-value instructional activities — redesigning lessons, deepening differentiation, investing in student relationships. Under-resourced teachers may simply use AI to keep pace with existing demands without upgrading their practice. The efficiency dividend risks widening the gap between well-supported and under-supported schools.
What’s Coming
Most of the evidence above is based on teachers using general-purpose chatbots through typing prompts, getting outputs, and manually editing results. That workflow is already shifting.
Agentic AI systems represent the next evolution: instead of responding to a single prompt, these systems can decompose a complex goal into sub-tasks, retrieve relevant data, execute steps autonomously, and iteratively refine their own outputs before presenting them to a user. In education, this means moving from “write me a quiz” to “design a personalized learning path for this student based on their assessment history,” with the AI handling the intermediate steps.
Early proof-of-concept work suggests the gains are real but specific. A multi-agent scoring system — where separate AI agents evaluated content, grammar, and coherence independently before a lead agent synthesized scores — outperformed standalone GPT-4o on essay grading, achieving 8.4% better accuracy and 13% greater consistency. The consistency improvement matters most: for assessment fairness, reducing variability between scores is arguably more valuable than marginally improving average accuracy.
The teacher’s role in this model shifts from prompter to orchestrator — less time crafting the perfect input, more time reviewing and approving the output of automated workflows. The 30% time reduction currently comes from delegating content generation. The next wave will come from delegating processes, and the tradeoffs are yet to be fully considered.
The Question Worth Asking
By now the time-savings question has a clear answer: yes, AI saves teachers roughly 30% of lesson preparation time, with no measurable quality loss. But what about the tougher question: can AI reduce teacher burnout?
If the time that AI saves gets absorbed by existing demands rather than redirected toward higher-value work, then any reallocation effect will only materialize in schools that intentionally design for it. For under-resourced teachers, the default outcome isn’t transformation; it’s keeping pace.
Which brings us back to where we started. Can AI reduce burnout? The honest answer is a qualified yes — not because it shrinks the job, but because it changes where a teacher’s hours go.That relief is conditional: it takes deliberate design — protected time, professional development — to turn saved minutes into the relational work that makes the profession sustainable.
Moving teachers from unsustainable to sustainable is a problem much larger than AI alone — but AI looks like it can make a dent, contingent, as always, on intention and implementation. The question worth asking isn’t whether AI can make teaching more efficient, but whether we’ll build the conditions for that efficiency to lift teaching off the floor of chronic burnout. From there, we can begin optimizing for better experience and learning, for teachers and learners alike, including the harder questions of reallocation and everything beyond.
Special thanks to Overdeck Family Foundation for sponsoring this article, the second of seven in our AI & Efficacy Editorial Research Series diving into key research findings from Stanford’s AI Hub for Education Research Repository (a project by Stanford’s SCALE Initiative). Stay tuned for more!
Top Edtech Headlines
1. Anthropic and Gates Foundation Launch $200M Partnership Focused on AI for Education and Global Impact
Anthropic and the Gates Foundation have announced a four-year, $200 million partnership aimed at expanding the use of AI across education, health, agriculture, and economic mobility initiatives. The collaboration will combine grant funding, Claude AI credits, and technical support to build tools for teachers, students, researchers, and underserved communities.
2. The Seed 100 Barely Covers EdTech. Here’s the Map That Does.
Angela Chen has released a new map and essay covering the concentrated ecosystem of specialist funds, corporate investors, grants, and competitions that continue to fuel the sector.
3. HireEducation 2026 Salary Report
HireEducation has released their 2026 HireEducation Salary Report: a data-driven analysis of compensation trends in the education products and services sector. This report is based on salary expectations gathered during over 1,500 interviews with candidates across the industry, offering unique insights into what it truly takes to attract and retain top performers.
4. Why U.S. Test Scores Are in a ‘Generation-Long Decline’
New analysis from Stanford’s Educational Opportunity Project and The New York Times finds that while some districts are rebounding from pandemic-era learning loss, many students across the U.S. remain behind in math and reading compared to pre-pandemic levels. The findings reveal major differences in recovery between districts, underscoring how factors like attendance, staffing, and local policy decisions continue to shape academic outcomes years after school closures ended.
5. Plug and Play Foundation EdTech Pitch Competition
Have an EdTech idea? This is your chance to pitch directly to the network behind companies like Dropbox, PayPal, and Honey and get your idea investor-ready before you even step on stage.
The Plug and Play Foundation is hosting an EdTech Pitch Competition on June 1 (1-3:30 PM PST) at Plug and Play’s Sunnyvale HQ, and they’re looking for EdTech founders building the future of education.
Listen In: AI, Jobs, and What Comes Next
We recently had Kumar Garg on The Edtech Insiders Podcast!
Kumar Garg is the President of Renaissance Philanthropy, where he leads thesis-driven philanthropic funds focused on major global challenges. Previously, he worked in the Obama White House Office of Science and Technology Policy and helped build Eric Schmidt’s science and tech initiatives.
5 Things You’ll Learn in This Episode:
How AI is projected to impact jobs, with forecasts suggesting modest but meaningful workforce contraction
Why community colleges and skilled trades may see rising demand in an AI-driven economy
How “superforecasters” are being used to predict labor market shifts more systematically
What policymakers and educators should track to prepare for AI-driven disruption
How Renaissance Philanthropy is rethinking how capital flows into big, thesis-driven ideas












This is the right question: AI should not just make teaching more efficient — it should make teacher time more impactful.
The most important pattern in the evidence is the “reallocation effect.” When AI handles the routine work, teachers can spend more time on the human work: diagnosing misconceptions, having one-on-one conversations, and giving targeted support where students are actually stuck.
That is exactly what we are building at QLM.
Our AI Math Tutor does not simply give students answers. It uses a Socratic model to help students reason through the problem, while TeacherOS gives teachers real-time visibility into misconceptions, reasoning gaps, and the micro-interventions each student needs next.
The goal is not to automate teaching. It is to make great teaching more timely, precise, and scalable.
For anyone thinking about how AI can reduce teacher burnout without weakening learning, this is the design line that matters: AI should preserve student thinking and amplify teacher judgment.
See what we’re building here: https://quantumlearningmachines.com/try-math-tutor
AI is going to have a tremendous impact on education so I'm thrilled that you are writing it from a more positive point of view, where is the media seems to want to alarm us. I was shocked that my co-author is the director of communications of a tech company and he ran our book Light Orbits through Claude a business more sophisticated version and Claude was busily saying this looks like the Theory of Everything unlike the standard model of physics. Obviously these thinking systems have gotten very good. If I can help you prepare some exciting articles I would love to do so. Not as good as your colleagues writing these great articles, but perhaps I could be the seasoning on the dish? I'll try reaching out to them and see if somehow a collaboration can take place because this is some of the most exciting ideas in history.