Vibe Coding Got You the Prototype. Now What?

Mar 25

This is the third post in a series about on-demand software. The first covered why most business software is about to become regenerable. The second explained why document standards are the interoperability layer that makes it all work. This one is about the part that makes it trustworthy.

Because here is the uncomfortable truth about vibe coding: it is fantastic for getting something working. It is terrible for keeping it working.

The Prototype Trap

I speak from experience. Earlier this year I built a terminal-based tool for reading community forum data. The first version came together in an afternoon. I described what I wanted, an LLM generated the code, and I had a working TUI by end of day.

Then I used it the next day. And the day after that. And within a week, I had found three bugs, wanted two new features, and needed to refactor the data layer because my initial description had not accounted for edge cases that only showed up with real usage.

That iteration process, finding bugs, adding features, refactoring as requirements clarify, is the software development lifecycle. I was not following a formal SDLC process. I was just doing what every developer does when a prototype becomes a daily-driver tool. But the practices were the same: version control to track changes, testing to catch regressions, and careful iteration to avoid breaking what already worked.

The moment on-demand software becomes something a user depends on, it needs exactly these guardrails. Without them, every improvement risks breaking something else. Every new feature is a coin flip between progress and regression.

Why AI Generates Regressions Fast

LLMs are incredibly good at generating code. They are also incredibly good at generating regressions. The same capability that lets a model build a working application in one session also lets it subtly break that application in the next session when you ask for a change.

This is not a flaw in the technology. It is a fundamental property of how generative models work. Each generation is statistically independent. The model does not remember that the authentication flow depends on a specific token format, or that the date parser expects ISO 8601, or that the CSS layout breaks if you change the grid columns. It generates the best response to the current prompt, and sometimes that response contradicts a decision made in a previous generation.

In traditional software development, this problem is solved by automated testing. You write tests that encode your assumptions, and those tests catch it when a change violates them. In the AI-generated software world, the same principle applies. You need a pipeline that validates output before it reaches the user.

What a Pipeline Looks Like

I have been building this exact system. My agent pipeline project defines eight workflow stages and nine agent roles that mirror a traditional SDLC process, compressed and automated.

The flow works like this. An Orchestrator agent receives a task description and routes it to a Project Manager agent. The PM produces a structured brief. An Architect agent turns that brief into a technical specification. Engineering agents build from the spec. QA agents run automated tests. Security agents scan for vulnerabilities. Accessibility agents check compliance. And only after all of those gates pass does the output move toward deployment.

Each handoff is a checkpoint. Each checkpoint has defined acceptance criteria. The agents are not making subjective judgment calls about whether something “looks right.” They are validating against specific, codified requirements.

This is SDLC. It is not bureaucracy. It is the thing that makes the difference between a prototype that works once and software that works reliably over time.

The Five-Tier Reality

Not every project needs the full pipeline. A quick utility script does not need a security audit and accessibility review. A production application handling financial data needs all of that and more.

My framework defines five project tiers based on complexity and risk. A micro task, something that takes minutes, gets minimal process. A production deployment gets the full treatment: architecture review, test coverage requirements, security scanning, accessibility validation, and staged rollout.

The insight is that SDLC is not one-size-fits-all. The practices scale with the stakes. Vibe coding is perfectly appropriate for tier one. It is dangerously insufficient for tier four or five. The pipeline provides the right level of rigor for each context.

Security Is Not Optional

Here is the part that keeps me up at night. AI-generated code inherits whatever patterns the model learned from training data. Some of those patterns include insecure defaults, outdated authentication methods, and vulnerable dependency versions.

In my pipeline, a git push triggers automated security scanning. Known vulnerability patterns are checked against the generated code. Issues are reported as structured findings that the Engineering agent must resolve before the code can merge.

This is not paranoia. This is standard practice in any mature engineering organization. The difference is that when humans write code, security review happens during code review. When AI generates code, security review needs to be automated because the volume and velocity of generated code makes manual review impractical.

On-demand software that skips security validation is on-demand liability. The tools need to be trustworthy, and trustworthiness comes from process, not from hope.

What This Means for On-Demand Software

If the future I described in the first two posts arrives, and I believe it will, then SDLC practices become the trust infrastructure that makes on-demand software viable for serious use.

A user generates a maintenance tracking tool. The generation pipeline runs it through schema validation to ensure it respects the data contracts from the standards layer. Automated tests verify that the core workflows function correctly. Security scanning checks for common vulnerabilities. Accessibility validation ensures it meets baseline usability standards.

All of this happens in seconds, not weeks. The pipeline is automated, and the guardrails are codified. The user never sees the process. They just get software that works and keeps working.

That is the vision. Not vibe coding with fingers crossed, but AI-generated software backed by the same engineering discipline that makes traditional software reliable. The SDLC is not dead. It is faster, automated, and more necessary than ever.

The next post in this series steps back from the technical and asks a different question: when AI agents review each other’s work, acting as domain experts in QA, security, and architecture, is that review good enough? Or do we still need humans in the loop?

Phil Johnston