This Week in Edtech – May 15, 2023: Here Comes the LLM Revolution

Edtech LLMs, AI Regulation, edX and Duolingo AI tools, and more!

Sarah Morin

Alex Sarlin

, and

Ben Kornell

May 16, 2023

The LLM Revolution and What It Could Mean for Edtech

This week we have an article written by Edtech Insiders co-founder Ben Kornell on Large Language Models (LLMs) and what they mean for the future of edtech. Enjoy!

It’s hard to believe it was just six months ago – SIX MONTHS?!? – that ChatGPT was launched for the world, which effectively fired the starting gun for the AI race (whether we liked it or not).

Amidst the subsequent AI explosion, the one accepted truth was that OpenAI’s GPT4 and other Large Language Models (LLMs) would form the foundation for the AI revolution for years to come. Large Language Models use parameters (variables present in the model that can be used to infer new content) and tokens (individual pieces of data used as inputs or outputs of a model). ChatGPT4 has at least 175 billion parameters and is trained on at least 156 billion tokens!

Previously, the thinking on LLMs was the bigger the model, the better the AI. However, in that case, only the largest tech companies could afford to build our AI future. As ChatGPT swept into classrooms, educators’ only option was to accept what was coming out way and adapt to our new AI oligarchs.

Over the past month, cracks have appeared in this foundational narrative. A new wave of innovation is happening at the infrastructure level. Specifically, open-source LLMs, as well as specialized LLMs have started to out-maneuver the behemoth models at OpenAI and Google. How did this happen and what are the implications for EdTech specifically?

TLDR: the time is ripe for a dominant EdTech LLM to emerge for kid-safe AI.

Llamas and Alpacas: The Rise of Open-Source LLMs

Open-source code is software with source code that anyone can inspect, modify, and enhance. It’s used in thousands of software products, and many coders (even those who work in Big Tech) live by it – Google’s Tensorflow and Meta’s React Native are two massive open source projects that have changed the landscape of ML and mobile development, respectively.

In the last two months, a new wave of high-quality, low-cost, open-source models was sparked by the leak of Meta’s unweighted LLaMA model on March 3, 2023 (‘weight’ refers to the relative influence of different parameters in a model, and LLaMA stands for “Large Language Model Meta AI”) .

If you want to dive down the rabbit hole of the resultant open-source LLMs, read Semi-Analysis’s coverage of a (supposedly) leaked Google email “We have no moat.” In essence, the LLaMA leak put out into the wild a well-trained 75 billion parameter model with no guardrails, no corporate constraints, and infinite possibility. Imagine OpenAI actually opening up their entire GPT4 model to any developer / AI-hobbyist!

Within weeks of LLaMA’s leak, developers at Stanford had launched “Alpaca” (ha ha), a version of LLaMA, which provided basic weights and allowed anyone to do fine-tuning for specific purposes. Since then, there has been a Cambrian explosion of open-source LLMs – some building off each other – with exponential acceleration. Others have forked the code to dive deep into super-specific use-cases. The net result is that with $500 and a good laptop, you too can build your own LLM!

The Meta leak and resultant growth of open-source LLMs, in and of itself, is not wholly surprising. What’s truly shocking is the quality of the outputs. Many of these bootstrapped models are achieving parity with OpenAI’s GPT3.5 and surpassing Google Bard, with the potential to overtake GPT4 imminently. Even companies as massive as Google can’t compete with a global community racing to optimize the fastest spreading tech in history.

Specialized LLMs: You Get an LLM, You Get an LLM

In parallel with open-source models, companies with incredible amounts of proprietary data have started to see the potential for specialized LLMs that can be applied to answer very industry-specific queries. ChatGPT is trained on an extremely wide dataset, thought to be over 10 million sites, but deep datasets may work better for specialized needs.

The first major mover here was Bloomberg, which in April 2023 built BloombergGPT, a finance-specific LLM from scratch, using a 50 billion parameter model (as opposed to 175 billion for ChatGPT3 and as many as 100 trillion for GPT4). They basically ingested humanity’s entire public financial history, as well as their own proprietary data, to find 700 billion tokens and train an LLM to answer questions like ‘Should I really have bought that much GameStop stock?’ (I am only being somewhat facetious). What made the Bloomberg news a revelation was twofold: 1) a smaller model could actually work for specific purposes and 2) people might be willing to pay for it.

Because all of this moves at breakneck speed, since the Bloomberg launch, there are already whole companies (ex. Faculty.ai and Lamini.ai) sprouting up to help companies both tune existing LLMs to their needs and even create their own LLMs. Not everyone has access to the 700 billion tokens that constitute the history of finance, but those without enough training data are turning to synthetic data (data created by AI itself based on real data) to augment their limited private and public data sets.

New Developer Behavior: Multi-LLM Products

The net impact of the rise of open-source and specialized LLMs is that none of these models live in isolation - open source LLMs can play nicely with Big Tech LLMs and specialized LLMs. Today’s frontline developers are starting to use different recipes for AI products.

Instead of utilizing one single LLM for all AI queries, savvy developers save their tokens on expensive platforms like GPT4 for only those queries where accuracy and/or complexity is highest. They mix in lower-cost queries from open-source LLMs or homegrown LLMs. Cloud providers like Google and Amazon are even offering segregated stacks where companies can blend their proprietary LLM with big tech AI without exposing private data. And in some cases, developers will also turn to a specialized LLM where industry-specific knowledge or guardrails are key.

The Future: An Edtech LLM?

This combination of factors is potentially game-changing for Edtech. It’s been clear for months that the capabilities of AI are incredibly compelling, but many developers and users are justifiably concerned about bias, data privacy, accuracy, etc., especially when it comes to AI that interacts directly with kids.

In the world where OpenAI, Google, and a few other dominant LLMs reigned supreme, there was little hope that we could change the models for the safety and success of kids. This is the social media paradigm; there’s only one Instagram, and only one TikTok, and they are - ahem - not optimized for school or even child use cases.

But in a world where developers are already mixing and matching different LLMs to build core AI features, a specialized kid-safe model, potentially built on open-source code, could weave in the right level of safeguarding without sacrificing overall AI performance. Not only that, but the Big Tech models could still be used by adults for their generalized value without risking the exposure of children.

For example a Multi-LLM school-friendly product flow might look like this:

The above flows show how student-specific data and interaction can be segregated to a specialized EdTech LLM while synthesized data can leverage GPT4 or other general-purpose LLMs for the benefit of educators or administrators. This kind of workflow, if executed well, could provide all of the AI horsepower to adults that support learners, while protecting learners themselves with a walled-garden EdTech AI.

Given what’s possible in this new world of open-source and specialized LLMs, it is clear that an education-specific LLM should exist, and also that it realistically could exist. Where we need answers is ‘how?’:

Will the EdTech LLM emerge as a finely-tuned version of an existing open-source model or will it need to be built from scratch? The difference in cost is huge, but a ‘from scratch’ model could avoid inappropriate content from its source data.
Will one EdTech LLM emerge as the gold standard or will we see fragmentation? This is as much a market question as it is a product one. If school districts or government policy dictate a high standard for AI that interacts with children, then there is a very real market opportunity for one or two LLMs to become dominant players.
Will investors and/or philanthropists step up to fund an EdTech LLM or will the risks of failure scare them away? This could be incredibly impactful infrastructure AND a highly-scalable business, but it is also a potentially tricky political football.

My best guess is that a dominant EdTech LLM is most likely to emerge from a consortium made up of trustworthy organizations that have both the competence to build the LLM and the brand + reach to scale it. Given what we’ve witnessed over the last six months, it’s impossible to predict. What we do know for sure is that the evolution of AI will continue to push us to recalibrate what is truly possible.

Edtech Insiders Live Events

Our next happy hour event is in Redwood City, CA on Thursday, May 18th. We would love to see you there, so please RSVP here if you’ll be able to make it!

Launch Announcement: Brisk Teaching

Announcing the launch of a new AI toolkit for educators from Edtech stars Arman Jaffer (ex-CZI, Google, White House), Tom Whitnah (ex-CZI, Meta) and Corey Crouch (Gradient Learning, Yes Prep, TFA). These are some of the most talented folks I’ve ever worked with in Edtech (sez Alex) and I’m thrilled that they’re working on something new!

Brisk Teaching is an AI assistant that saves educators 10 hours a week across tedious tasks like curriculum-writing and grading. The Chrome extension offers an AI-writing detector, a proprietary curriculum generation tool, the ability to change the reading level of any news article, with many more abilities on the way. Brisk's approach of integrating with teacher tools like Google Docs ensures that educators can work less and teach more.

Podcast Deep Dive: The Chegg and ChatGPT Fallout

With all the breaking news about Chegg’s falling stock prices last week, we did a special live episode covering what happened.

Homework helper Chegg's stock plummets as students turn to ChatGPT

If you still aren’t sure exactly what happened with Chegg and ChatGPT, and what this means for other edtech companies in the same sector as Chegg, listen to this 25 minute special episode for a full rundown!

This Newsletter Is Sponsored by Magic Edtech

Magic EdTech has helped the world’s top educational publishers and Edtech companies build learning products and platforms that millions of learners and teachers use every day. Chances are that you're probably using a learning product that they’ve helped design or build!
Companies like Pearson, McGraw Hill, Imagine Learning, American Museum of Natural History have used their help to design or build some of their learning products. Now, Magic wants to bring its pedagogical and engineering expertise to make your key learning products accessible, sticky, and well-adopted.

Top Edtech Headlines

1. AI Regulation Fever

As AI offerings continue to evolve at a rapid pace, the race to regulate AI is on, with the US

This week, the EU is legislatively taking a stand against the potential harm of unregulated new technology while trying to maintain room for innovation and growth in the field as well. Given the EU’s penchant for collaboration, some are expecting the EU’s efforts to be an attempt to build the “World Playbook for AI”.
China has already sprinted far ahead of the pack in regulating AI.
Meanwhile, back in the US, FTC chair Lina Khan has publicly called for regulations, and Sam Altman is asking Congress outright for regulation… but many are skeptical about our polarized, gerontocratic Congress’s ability to get ahead of the technology, as per news like “Congress wants to regulate AI, but it has a lot of catching up to do” (NPR) and “AI regulation: U.S. lawmakers 'incredibly late to the party', says expert” (Yahoo Finance)

2. edX Debuts New AI Tools

We are seeing new AI tools release in the Edtech space every week, and this week Edtech giant edX released two new AI powered tools: an edX plugin for ChatGPT that directs the user towards available courses on edX based on their ChaptGPT searches, and an AI assistant called edX Xpert available through the edX platform.

3. Duolingo Growing, Launching Music App

Duolingo continues to lead the pack for public Edtech stocks; its earning report last week showed 62% DAU Growth, 63% Paid Subscriber Growth, 42% Revenue Growth, and Increased Profitability in the first quarter of 2023, with increased revenue expectations.

After launching Duolingo ABC, Duolingo English Test, Duolingo Math, and the recent Duolingo Max AI suite, Duolingo announced last month that it’s moving into music.

Camp Duo 2020: Language learning in your own backyard — Duolingo is indeed on fire, with increased growth and profitability

Duolingo CEO Luis Von Ahn on the rise in paid subscriptions:

“So our paid subscriptions just have been consistently growing ever since we launched our subscription. That's just basically -- they just keep growing and growing and growing. And there's no single reason for that.
I mean we just get better and better at converting our users. And we do that by a number of things. I mean, one of them is just making the subscription more interesting by adding more features to it or improving the features for it. But also by merchandising, we get a lot better at knowing when to advertise the subscription, what to say to you to get you to subscribe.
So all of that just -- we run enough A/B tests that gets us to more and more subscriptions. I think that's the main answer. That's just the standard thing that keeps happening every quarter. So there was a jump -- there's been a jump in, I would say, every quarter since we've been public.” (source)

4. Singapore Leads the Way in Upskilling

Upskilling has become a focus of the Edtech industry as the rapidly changing nature of AI and technology continue to impact the workplace, and Singapore, whose education system is known for its agility and hyper-responsiveness, is placing extreme focus on this topic with corresponding results. Read more about their efforts and outcomes here!

5. ASU shoots for the Astelars

With Dreamscape Learn, Dreamscape Immersive and ASU have launched virtual science labs in which students can use immersive tech to experience in-depth science labs.

Now, ASU has dropped in-person labs for introduction to biology in favor of the immersive experiences, which are showing positive results for engagement and grades, including among historically marginalized populations (Report here)

We’re keeping an eye on this; in a world in which colleges are increasingly in competition for declining enrollments, VR theaters, which combine learning, tech and fun, may become a highlight of the campus tour.

In a recent VR experience, Arizona State biology students found themselves in a virtual habitat for imaginary creatures known as astelars (“imaginary starfish-like creatures that change colors.”)… In the VR, students went on a journey to investigate why the astelar population is in decline.
Upon removing their goggles, the students gathered in small groups to consider data and model scenarios related to whether the species might fare better in a physically hot environment or in a predator’s environment. As the group hovered over a spreadsheet flush with data, one student offered a proposed solution.
“Don’t do that!” another student responded as Hale and her team looked on. “You’re going to fucking kill the astelars!” That was the moment that Hale understood that VR fostered the students’ empathy for the (imaginary) creatures.
Susan D’Agostino, Inside Higher Ed

Bonus: Byju’s Continues to Confuse

Matt Tower of Edtech Thoughts put together a chronological list of Byju’s headlines between March and July 2022, which whiplashed from extreme positivity to extremely concerning, with all the hallmark eye-popping numbers we’ve come to expect from Byju’s, India’s (and the world’s) largest private Edtech.

Prepare for yet another chapter in the roller-coaster world of Byju’s.

In the wake of last week’s startling news that Indian financial authorities had raided three properties connected to Byju Raveendran, founder and CEO of Byju’s…
… Byju’s is now lookingon its way to closing ~$1B funding round at its (contended) $22B valuation, in addition to the $5.8B it has already raised from a who’s who of global VC. The first $250M just closed from New York firm Davidson Kempner, in anticipation of Byju’s taking one of its subsidiaries, physical tutor center chain Aakash, to an IPO at a potential $1B valuation.

To put that $22B valuation in context, it’s more than the market cap of the top three largest public Edtech companies (Pearson, New Oriental and Duolingo) combined.

Or try this: if you added the market cap of Powerschool, Wiley, Coursera, Udemy, Scholastic and Kahoot! together, and then doubled it, you’d be about at $22B. 🤯

Recent Edtech Insiders Podcast Episodes

We’ve had a ton of amazing guests on the podcast recently, and some unforgettable conversations! Many of the episodes we’ve released in the past week were taped during ASU+GSV, and are not to be missed! Be sure to keep up with all our recent episodes of the Edtech Insiders podcast here!

Perry Kalmus of college admissions platform Akala

Ashley Chiampo, CLO of Emeritus

Chuck Cohn and Anthony Salcito, CEO and CIO of Nerdy

Funding Rounds

MPOWER Financing raises $150M credit facility
- US Based Fintech/Edtech MPower "provides loans to international students to study in the US and Canada.” The money is coming from a little mom-and-pop shop in New York called Goldman Sachs.
Edflex raises $13M
- Edflex, an upskilling platform in France, raised money from around led by European Edtech VC Firm Educapital.
Teky Alpha raises $5M
- Teky Alpha, a Vietnamese Edtech startup, raised the money from the Singapore-based impact investment firm Sweef Capital (“Sweef” seems to stand for “Southeast Asia Women's Economic Empowerment Fund”).
- The company operates 16 STEAM academies in five cities across Vietnam and has partnered with more than 45 schools across the country to deliver STEAM courses to over 25,000 children.
  - If you are thinking that “Teky Alpha feat. Sweef” would look right at home on a US rap album, you are correct.
Simango raises $3.8M
- Simango is another French Edtech which conducts healthcare training in VR, combining two rising trends in the Edtech world, led by Epopée gestion (Translated as Epic Management) and Breizh Up (Breizh means “Britanny” in Breton, the local language of the French region).

Acquisitions

Lots of strategic acquisitions this week!

Classcraft acquired by HMH
- Classcraft, a leading gamification of education and differentiation platform, was acquired by digital content giant HMH. Check out our interview with of CEO Shawn Young of Classcraft from fall 2022:
Go1 acquires Blinkist
- Blinkist, a platform which provides short-form synopses of books (and one of Alex’s favorite Edtechs- 200 blinks read!) was acquired by Australian B2B Edtech Unicorn Go1
D2L acquires Connected Shopping (creators of platform Course Merchant)
- D2L, the Canadian LMS platform with ~17% of the US Higher Ed market, acquired a platform that contains payments and enrollments, allowing course creators to make and sell courses. This puts D2L into the same space as async platforms Teachable, Thinkific, Kajabi and Podia, and possible into the cohort-based course space with Maven, Disco and Mighty Networks.
HireVue acquires Modern Hire
- Talent Acquisition platform HireVue acquired a rival this week, apparently to “bolstering its skills-based screening capabilities and ethical AI efforts.” Skills-based screening and skills training are two sides of the same market

Thanks for reading Edtech Insiders! Subscribe for free to receive weekly updates and support the podcast.