Code is now free, but working code is not (yet).

Ludwig Wendzich

March 20, 2026

Part 3 of the series: Marshmallowy: What LLMs Change — and Where the Edges Are and If You're Adding a Chatbot, You've Missed the Point

While many people are prophesying the death of the Software Engineer as a profession (and that may well prove true), right now, in March 2026, we're living in a messy middle that's far more interesting than the prophecy.

Since Opus 4.5 dropped last year, Claude Code has shown that it can actually write code to a reliable level. Many people — the makers of Claude Code included — have cited the fact that since that inflection point they've stopped writing code entirely. Boris Cherny, the creator of Claude Code, responded to someone challenging why Anthropic is still hiring engineers when "100%" of the code is written by Claude: "Someone has to prompt the Claudes."

We prompt the Claudes.

Sterling started in October of last year, just before that inflection point. My co-founder Nik had been building the POC for a few months before that, and I'd been helping folks build apps using Cursor. Prior to that tipping point, we were both dancing on the edge. Nik probably had more reason to dip into editing code directly — he probably had an IDE installed on his machine. Other than Cursor, I did not. I still do not. And at this point I'm pretty confident Nik doesn't fire up his IDE either.

So yes: humans shouldn't be writing code anymore. That part of the prophecy has landed. But here's the thing nobody talks about enough.

The Dark Factory Problem

There's talk of "Dark Factories" being applied to software. The idea comes from manufacturing: robots don't need lights, so if you build a factory around robots you don't need the lights. What happens when you design a software factory around LLMs? What changes?

Folks like Steve Yegge are going to incredible lengths to figure this out. His Gas Town project orchestrates colonies of 20–30 parallel AI agents through structured hierarchies — mayors, coders, reviewers — all trying to work as a self-sustaining software team. It's not just him. Anthropic's own team tasked 16 Claude agents with building a C compiler from scratch. Over two weeks and $20k in API costs, they produced a 100,000-line Rust-based compiler that passes 99% of GCC's torture tests and can compile the Linux kernel. Cursor's team orchestrated hundreds of agents to build a web browser in a week, generating over a million lines of code.

These experiments are fascinating. And they all share something in common that's easy to miss.

The People Who Built the Test Suite Did the Heavy Lifting

Every one of these successful autonomous experiments is building against a very strict test suite. The C compiler has GCC's torture tests. Cursor's browser had defined rendering specs. It's the existence of that test suite that allows the agents to succeed — they have something concrete to pass. And that test suite was a result of decades of human work.

I have not seen an example of somebody building a new application, or even harder, modifying an existing application, successfully without human intervention. Sure, agents can write automated tests — and they do. Sure, agents can even browse through your application — and we do that too. But their context window, no matter how big, can't yet match mine.

I have a working understanding of most of our app. I also have a working model of human interaction. And of our customers. This is a lot of context that somehow my brain can mix together in an instant and spot something that's off. Or test a non-happy path that the agent didn't think of. That's what's still required to deliver a quality experience. This is the “taste” that’s required to build something insanely great—something internally consistent and delightful—instead of a Frankenstein’s monster of an experience.

The Impedance Mismatch

Here's where it gets uncomfortable. There is a genuine impedance mismatch between the speed at which we can create code today — because it is absolutely true that code is effectively free — and the speed at which that code can be made production ready. Or as Simon Willison puts it: proven to work.

At Sterling we run 5–6 agents at once. We absolutely believe the world has changed and multi-threaded engineering is the new normal. But we're still limited by the speed at which one person can get those changes production-ready. Code creation has scaled dramatically. Code verification has not.

Anyone Can Build a Prototype

One thing I hear a lot is that "anyone can build a SaaS app" so SaaS is dead. It is absolutely not. Anyone can build a prototype, sure. But that prototype is not tested. It's not reliable, or scalable. And every layperson does not have the time to do that testing, or the expertise to understand how to ensure it scales.

For Sterling, it is absolutely possible for our customers to vibe-code their own version of what we do. And then they'd spend more time making it work reliably than they currently spend on the manual work they're trying to replace. Claws included.

It doesn't need to be a SaaS application running on a production server somewhere. It just needs to be a system that's built up enough complexity to be sufficiently valuable, that's been in use for a while, and has folks relying on it. That's still hard. That's what takes time.

Let’s not forget about the fact that for many valuable use cases the application won’t be self-contained: there’ll be third-party APIs that will need a developer account (which needs to be applied for), and marketplace approvals that take time to come through and are ultimately blocking.

The Messy Middle

Are we building a Software Factory? Of course we are. We're pretty AI-pilled at Sterling — for obvious reasons. But I maintain that just like we need to understand the limits of LLMs to build reliable agents for our customers, we also need to understand the limits of coding agents so we can build a reliable application. We spend more time than most right now building our Software Factory, and at the same time we have to wrestle with this bottleneck that, today, isn't a solved problem. It might be one day. That's not today.

We live in a time where the role of a software engineer has changed completely: they shouldn't be writing code anymore. And yet they are still vital to delivering quality software. In fact — they are still the bottleneck. Throughput has scaled up ridiculously with agents, but it's not unbounded. And that bottleneck contracts as the complexity of your application and the size of your organisation increase.

The bottleneck moved. It used to be writing the code. Now it's proving the code works. And until that's solved, the Software Engineer isn't going anywhere.

Book in a demo with our Founder CEO today

Photo of Nik Wakelin

A 30-min call is all it takes to see how Sterling can start helping you save time right away.

Book a demo with Nik