18 May 2026

Where the work goes when the code writes itself

What changes in the SDLC when the most expensive stage stops being the most expensive, and why the work is now upstream in the decisions a system is meant to make.

Most descriptions of how AI will change software development do not describe software development. They describe coding. The two are not the same, and the distinction is now the most important one a technology leader can hold in their head.

Coding is the stage where intent becomes source. The software development lifecycle, the SDLC in the older acronym, is everything around it. It includes the discovery work that turns a vague business problem into a tractable one. It includes the specification work that turns a tractable problem into something engineers can build. It includes the integration, review, and validation work that turns built code into deployable code. And it includes the operation, monitoring, and feedback work that turns deployed code back into the next problem. Coding has always been one stage among many. The reason we came to conflate it with the whole is that, for forty years, it was where the cost lived.

That has changed. The interesting question for a leadership team is not whether AI writes code well enough. By now, in the contexts where it works, it plainly does. The question is what the SDLC looks like when the most expensive stage stops being the most expensive.

The shape of the lifecycle is not the shape of the ceremonies

Before getting to the changes, it helps to be precise about what is changing. The phrase "the SDLC" has come to mean several different things at once, and conflating them produces sloppy thinking.

In the most literal sense, the SDLC is a sequence of value transformations. A question becomes a brief. A brief becomes a specification. A specification becomes a working system. A working system becomes a deployment. A deployment becomes telemetry, and telemetry becomes the next question. Each arrow is real work. It consumes time, judgement, and often political capital.

In a looser sense, "the SDLC" has come to mean the ceremonies that surround those transformations. Standups, refinement sessions, sprint reviews, change advisory boards, post-incident reviews. These are not the lifecycle. They are scaffolding put up to manage the lifecycle, and the scaffolding is often mistaken for the building.

The distinction matters because AI is doing very different things to each. It is collapsing several of the transformations to near-zero cost. It is leaving the ceremonies more or less untouched. Many organisations now have ceremonies that consume more elapsed time than the work the ceremonies were invented to coordinate. That is a kind of operational dissonance, and it will resolve itself, one way or another, in the next two to three years.

What collapses, what survives, what emerges

If you map AI's effect onto the transformations rather than the ceremonies, three things become visible.

A handful of stages collapse. Translating a clear specification into source code is now closer to a few minutes than a few weeks. Writing a first draft of a test suite. Generating boilerplate for a new service. Producing migration scripts. Drafting API clients from upstream schemas. These are still genuine engineering activities, but their unit cost has fallen by something between one and two orders of magnitude. So has the cost of producing the artefacts adjacent to code. Changelogs, release notes, runbook drafts, infrastructure-as-code scaffolding, documentation that actually reflects the code. None of these were the most important parts of building software. They were simply the most laborious.

Some stages are largely unchanged, and the reasons are worth being honest about. Negotiating between two product owners with incompatible mental models of a feature is not a token-prediction problem. Deciding whether a regulator will accept a particular interpretation of a data-residency clause is not a token-prediction problem. Reading a customer's face during a discovery interview is not a token-prediction problem. The parts of the SDLC that involve adversarial intent, irreducible ambiguity, or social judgement are doing more or less what they did before. They will continue to.

The most interesting category is the stages that emerge. This is work that did not have a name a few years ago because the cost of doing it was prohibitive. Continuous evaluation of model outputs against ground truth. Maintenance of structured specifications that are themselves executable. Capture of decision context in a form that downstream automation can read. The deliberate cultivation of feedback loops so that telemetry drives specification, not just informs it. These were always good ideas. They are now affordable.

The constraint moves upstream

When you compose those three movements, a clean pattern appears. The bottleneck of the lifecycle has moved upstream, away from the keyboard and back toward the conversation. The expensive stage is no longer turning a specification into code. It is producing a specification good enough that the resulting code is the right code.

The useful way to think about what a specification actually contains is as a set of decisions. A working system, viewed from one angle, is a long sequence of decisions that someone, at some point, made on its behalf. What counts as a valid order. Which customer states allow which transitions. What the system should do when an upstream service is degraded but not down. Most of these decisions live in the heads of three or four people. Some live in code as conditional logic, which is to say they live in code as artefacts of the moment they were written and are difficult to recover later. A few live in policy documents that nobody reads.

The work that has moved upstream is not, mostly, specifying features. It is surfacing those decisions, naming them, and capturing them in a form precise enough that downstream automation can act on them without re-deriving them every time. Teams that find real traction with AI in the lifecycle usually stop arguing about what to build and start arguing about which decisions the thing they are building is meant to make.

This sounds like a return to the waterfall era, but it is not. The specifications that work in this new regime are not two-hundred-page documents written before a line of code is touched. They are smaller, structured artefacts. A few paragraphs. A decision record. A set of worked examples that can be revised in tight loops against working software. The difference is that the specification is now the thing humans spend their time on, and the code is, to a first approximation, a compiled output of it.

Anyone who has watched a strong engineering team work with AI tools recognises the pattern. The senior engineer spends almost no time typing. They spend their time interrogating the problem, articulating the constraints, naming the failure modes, and reading the diff. They reach the hard part of the problem faster than they used to, because the baseline functionality is no longer where the time goes. The shift looks small from the outside. It is structurally enormous.

What this means for where to invest

For an executive sponsor, the practical question is what to fund. The honest answer is, not more coding capacity. The places where the marginal dollar now produces disproportionate return are unglamorous, and several of them sit outside the traditional engineering budget line.

The first is the capability to specify decisions. The practice of writing down the decisions a system is supposed to make, in a form precise enough that they can be checked. In most organisations this work is currently done verbally, in meetings, and is forgotten within a quarter. The decisions get re-derived every time a new engineer joins the team, with the predictable result that they slowly drift, and the drift goes unnoticed until the system does something embarrassing in production. Treating decisions as artefacts, with owners and lifecycles, is closer to a librarianship problem than an engineering one. It pays back almost immediately.

The second is evaluation infrastructure. The boring plumbing that lets a team know, on a continuous basis, whether their system is doing what it is meant to do. Ground-truth datasets. Regression suites that survive contact with production traffic. The operational discipline to run them. Organisations that have invested here are pulling away from the rest of the field in a way that is becoming hard to disguise.

The third is the institutional memory behind the decisions. The why behind the what. The first investment captures what the system decides. This one captures why the team decided to build it that way. AI systems are extraordinarily good at executing on context, and pitifully bad at recovering context that was never written down. Companies that capture decisions, constraints, and trade-offs in machine-readable form will compound. Companies that keep those things in the heads of three senior people will discover, on the day those people leave, what they actually lost.

None of these are model purchases. None of them are platform licences. They are organisational investments, and they look more like a quiet build-up of capability than a launch.

What this means for how to build it

For a technical leader, the corresponding question is what the implementation looks like. A few principles are emerging from the teams that are getting this right.

Treat the lifecycle as state, not procedure. A modern SDLC is best understood as a long-running workflow with persistent state. What stage each piece of work is in. What artefacts exist at that stage. What gates have been passed. State can be inspected, queried, and reasoned about by other software. Procedure cannot.

Treat agents as services, not assistants. The useful unit is not a chat window attached to a developer. It is a well-scoped, well-instrumented service that performs a specific transformation in the lifecycle. Drafting a specification from a discovery transcript. Generating a test plan from a specification. Producing a runbook from an incident timeline. Services have inputs, outputs, telemetry, and SLAs. Assistants have moods.

Treat evaluation as a first-class artefact, not an afterthought. The hardest discipline to build is the habit of asking, every time a new automation is introduced, how will we know when it stops working? Teams that answer this well end up with infrastructure that resembles a small internal observability platform. Teams that answer it badly end up with confident-sounding outputs and no way to detect drift.

What this looks like when you actually build it

These principles are easier to write down than they are to live with, so it is worth describing what they look like in a real system. We have spent the last year building one. A few of the patterns are non-obvious until you have made the wrong choices once or twice.

The first is that stages have to be enums in a database, not headings in a deck. Every piece of work in our platform moves through a defined sequence of stages, from scope and planning through to implementation and operation. Those stages are first-class values in the schema, with constraints on which transitions are legal. The effect is that the question "where is this work up to" is a query, not a meeting. It also means that any agent in the system can reason about the state of any program without asking a human, which sounds like a small thing until you realise it is the difference between automation and theatre.

The second is that every agent is a service with a manifest. Each one declares its capabilities, its tools, its dependencies, and the endpoints it exposes. New agents do not get bolted on. They get registered. That is more work upfront than a one-shot integration, but it is the only structure that survives the second year, when you have eight agents instead of three and you still need to know which one is responsible for what.

The third is that every model call goes through a single chokepoint. Not just logged for debugging, but routed through one function that records the prompt, the response, the model, the cost, and the latency, and feeds them into an evaluation pipeline. The reason is unglamorous and important. The only way to know whether an agent is getting better or worse over time is to have a corpus of its past behaviour to compare against. Teams that skip this step typically discover, six months in, that they have a fleet of agents and no way to tell which ones are decaying.

None of this is exotic. It is the application of patterns engineering teams have used for decades, applied to a workload that did not exist five years ago. The interesting thing is that the discipline required is mostly old, and the payoff is mostly new.

A note on overshoot

Cycles like this overshoot. There will be a period, and we are arguably in it, during which the value of AI in the SDLC is overstated, the failure modes are understated, and the operational maturity required to deploy these systems safely is treated as a detail. Most organisations will buy too much, integrate too quickly, and discover that the cost of cleaning up an unmaintainable agent estate is comparable to the cost of cleaning up an unmaintainable microservice estate, which is to say, considerable.

The deeper failure mode is more interesting than the obvious one. Most organisations are still running the SDLC they had, with AI bolted onto it. The teams that pull ahead will be the ones running an SDLC built around AI. The distinction looks semantic until you have tried both. One is a faster version of what you used to do. The other is a different thing.

The teams that come out of the next three years with a structural advantage will not be the ones who adopted fastest. They will be the ones who understood, earlier than their competitors, that the work has moved. The interesting questions are upstream. The work is in the decisions, the evaluation, and the memory. The code, finally, is the easy part.