← Back to all posts
AI & AGI
April 15, 2026· 6 min read

The Path to AGI: World Models vs. LLMs — and Why the Answer May Be Both

There are two dominant bets on how we reach artificial general intelligence: keep scaling large language models, or build world models that learn how reality actually works. Here's the case for each, where each breaks down, and why the real answer is probably a fusion of the two.

TL;DR

Two camps are racing toward AGI. LLMs scale next-token prediction over internet-scale data — general, useful today, but ungrounded and prone to hallucination. World models learn how reality evolves from video and interaction — grounded and sample-efficient in principle, but early and weak at abstract reasoning. The most likely path to AGI is a hybrid: a world model for grounded prediction and planning, an LLM for abstract reasoning and language, fused into one system.

By Ravi Chachra
AGIworld modelslarge language modelsLLMsAI thesisfrontier AIEight Capital

Ask two serious AI researchers how we get to artificial general intelligence and you will often get two completely different answers. One camp says we are already on the path — keep scaling large language models, add reasoning and tools, and general intelligence emerges. The other says language models are a brilliant detour, and that real intelligence requires a system that learns how the world actually works. This is not an academic disagreement. It shapes where billions of research dollars go, which startups get built, and what the next decade of technology looks like. So it is worth understanding both bets — and why the truth probably sits between them.

The LLM Path: Scale Is the Strategy

The large-language-model approach is the one everyone has experienced firsthand. Train a transformer to predict the next token across a vast corpus of human text and code, scale the model, the data, and the compute — and capabilities that nobody explicitly programmed begin to emerge: translation, coding, reasoning, summarization. Layer on reinforcement learning, chain-of-thought, and tool use, and the system starts to look unsettlingly general.

The case for LLMs as the road to AGI is strong:

  • It is working right now. No other approach has produced anything close to the breadth of useful, deployed capability. Progress has been fast and, so far, has not clearly stopped.
  • Generality for free. A single model handles law, medicine, code, and poetry. Language is a compressed encoding of human knowledge, so a model that masters it inherits an enormous amount of the world.
  • A clear scaling recipe. For years, more data and more compute reliably bought more capability. That is a rare and valuable thing: a known dial to turn.

But the criticisms are equally serious:

  • No grounded model of reality. An LLM learns the statistics of text about the world, not the world itself. It has never dropped a glass or watched one fall. This shows up as hallucination, brittle physical and causal reasoning, and confident nonsense.
  • Wild data inefficiency. A child learns physics from a few years of looking and touching. An LLM needs much of the internet — and we are starting to run low on high-quality human text to feed it.
  • Planning and persistence are bolted on. Genuine long-horizon planning, memory, and self-correction are not native to next-token prediction; they are scaffolded around it, and it shows.

The World-Model Path: Learn How Reality Works

The competing bet starts from a different premise: intelligence is the ability to predict and act in the world, not to predict the next word. A world model learns an internal simulation of how an environment evolves — often from video, sensory streams, or interaction — so it can imagine the consequences of actions before taking them. This is the lineage behind model-based reinforcement learning, the interactive world models coming out of the big labs, and prominent researchers who argue that language is the wrong substrate for grounded intelligence.

The case for world models is compelling in exactly the places LLMs are weak:

  • Grounding. A system that learns from observation and interaction builds genuine intuitions about objects, physics, and cause-and-effect — the things LLMs only approximate from text.
  • Planning is native. If you can simulate the future, you can plan: try actions in your head, evaluate outcomes, and pick the best. That is the core loop of agency, built in rather than scaffolded on.
  • Potential sample efficiency. Learning from a coherent stream of experience, the way animals do, could eventually need far less data than brute-forcing the whole internet.
  • Embodiment. Robotics and any system that must act in the physical world need a predictive model of that world. This is the approach that naturally extends to atoms, not just bits.

And yet the world-model path has its own hard problems:

  • It is early. Nothing in this lineage has produced the broad, deployed, money-making generality that LLMs have. The proof points are mostly in narrow or simulated environments.
  • Abstraction is unsolved. It is far from clear how a model trained to predict video gets to mathematics, law, or language — the abstract reasoning LLMs are already good at.
  • Brutal compute and data. High-fidelity prediction of rich sensory streams is enormously expensive, and we lack the clean, internet-scale interaction data that made LLMs possible.

The Hybrid Case: Probably Both

Lay the two side by side and something obvious emerges: their strengths and weaknesses are almost perfect mirror images. LLMs are strong on abstraction, language, and broad knowledge, and weak on grounding, planning, and physical reasoning. World models are the reverse. When two approaches fail in opposite directions, the engineer's instinct is not to crown a winner — it is to combine them.

The most plausible architecture for AGI looks like a division of labor. A world model provides grounded prediction and planning — the fast, intuitive sense of how things work and what happens next. An LLM provides abstract reasoning, language, and access to humanity's accumulated knowledge — the deliberate, symbolic layer. One supplies grounded intuition; the other supplies reasoning and communication. Neither is sufficient alone; together they cover each other's blind spots.

We are already seeing the early edges of this convergence. Multimodal models are pulling vision and action into systems that began as pure language. Agentic frameworks bolt planning loops and memory onto LLMs. Video and simulation models are being trained as implicit world models. None of these is the finished article, but the direction of travel is unmistakable: the two camps are quietly borrowing from each other. The honest answer to 'world models or LLMs' is that the question is probably a false binary, and the teams who treat it that way will build the most capable systems.

Why We Care

We are early-stage investors, not an AI lab, but this debate sits right at the center of how we evaluate companies. Every investment we make gets stress-tested against a simple question: when far more capable AI arrives, does this company's value increase or evaporate? The path-to-AGI question is just the long version of that test. A startup whose moat is a thin wrapper over today's best LLM is exposed if the frontier shifts toward grounded, world-model-driven systems. A startup whose advantage is proprietary data, real-world integration, domain depth, or a defensible position in the stack tends to get stronger as the models improve — whichever camp wins.

That is why we increasingly favor companies that sit on the durable side of this transition: the infrastructure both approaches will run on, the data and simulation environments that grounded systems will need, and the founders with the domain expertise to apply whatever AGI ends up looking like. We do not need to know exactly which architecture reaches general intelligence first. We need to back companies that win in more than one of those futures — and ideally in all of them.

More from the Blog

Explore more insights from the Eight Capital team.

View All Posts