Verifying AI-Written Code: Why We Backed Theorem | Eight Capital

AI has gotten remarkably good at writing code. It has gotten almost no better at telling you whether that code is correct. As more and more of the world's software gets generated by models rather than typed by humans, that gap becomes the single most important problem in software — and it is the problem Theorem, a company we've backed out of YC's Spring 2025 batch, exists to solve.

The New Bottleneck Isn't Writing Code — It's Trusting It

For most of software history, writing code was the expensive, slow, human part. AI has inverted that. Generation is now cheap and fast; what is scarce is confidence that the generated code does what it is supposed to — and nothing it isn't. AI-written code carries subtle bugs. AI-enabled attackers can probe software for weaknesses at a volume and velocity human security teams cannot match. And human code review, the traditional backstop, simply does not scale to a world where machines write the code.

So the question that matters is no longer 'can AI write this?' It is 'can we prove it's right?' That shift — from generation to verification — is where the next layer of essential software infrastructure gets built. Theorem is building exactly that layer.

What Theorem Builds

Theorem applies AI to formal verification — the mathematical discipline of proving that software behaves exactly as specified. Historically this was the most rigorous and the most painfully slow way to write software, reserved for the highest-stakes systems. Theorem's bet is that AI changes the economics entirely: train models that are as capable at proving code correct as today's models are at writing Python, and verification stops being a luxury and becomes a default. Their stated mission is as blunt as it gets — verify all software.

The signature method is program-equivalence-driven development: a developer writes a simple reference implementation that is easy to read and trust; Theorem's AI then generates a high-performance version and produces a mathematical proof that the two are functionally identical. You get the speed of optimized code with the certainty of the simple version — useful for performance work, for migrating code between languages, and for any setting where 'probably correct' is not good enough.

And this is not a deck full of promises. Their published research already shows the approach working: an AI system that translated well over a thousand formal statements between proof languages hundreds of times faster than humans, and a proof technique that scales testing effort logarithmically with how rare a bug is, rather than linearly. These are the early, concrete signs that AI-native verification is real and accelerating.

Why These Founders

Verification is a domain where credibility is everything, and the founding team has it in a way almost no one does. Before Theorem, co-founder Jason Gross earned his PhD at MIT building verified cryptography code that now helps secure the HTTPS protocol behind trillions of internet connections every day — a project that, by his own estimate, took roughly fifteen person-years to complete by hand. Theorem is, in a sense, his attempt to compress that fifteen-year effort into something AI can do continuously. With co-founder Rajashree Agrawal, this is a team whose entire background is the exact problem they're now attacking. That is the biographical founder-market fit we look for — founders who didn't pick a market, but were made for one.

Why It Passes Our Test

We stress-test every investment against one question: when far more capable AI arrives, does this company's value increase or evaporate? Theorem is one of the cleanest 'increase' answers we've seen. A better model is pure tailwind: more capable AI means more AI-generated code, which means more code that must be verified, in higher-stakes settings, against more sophisticated AI-driven attacks. The demand for the trust layer grows in lockstep with the capability of the models. It is the precise opposite of a thin wrapper, whose value evaporates the moment the underlying model improves.

We're not alone in seeing it. Theorem recently raised a $6M seed round led by Khosla Ventures, with participation from Y Combinator and a roster of technical angels. That is exactly the pattern we like as a pre-Demo-Day investor: get in early on conviction, and watch top-tier funds confirm the thesis shortly after.

The Honest Tension

It would be dishonest to pretend this is a sure thing, and the risk is worth naming. Formal verification has a long history of being intellectually brilliant and commercially narrow — for decades it lived mostly in aerospace, defense, and cryptography, because the cost of doing it outweighed the benefit for everyone else. The entire bet rests on a single premise: that AI finally flips that cost-benefit equation, turning verification from a luxury into something cheap enough to apply everywhere. We think the AI inflection is exactly what makes now the right moment — and the smart wedge is to start where the stakes are already highest (public infrastructure, finance, hardware) and expand outward as the cost of proof keeps falling.

Why We Backed Theorem

Put the pieces together and Theorem is a near-archetype of what we look for: a deep technical moat that is the opposite of a wrapper, a founding team built for the exact problem, and a business that gets more valuable as AI gets more capable. In a world where AI is eating software, the companies that endure are the ones that own something a better model makes more essential, not less. Proving that code is correct — when more of it than ever is written by machines — is about as essential as it gets. That is the trust layer, and we think it becomes load-bearing infrastructure for the entire AI-software era.

The Trust Layer: When AI Writes the Code, Who Proves It's Right?