Stop Letting Your AI Agents Be Generalists

I've been building two apps with AI agents doing most of the heavy lifting. One is a ham radio propagation tool for iOS. The other is a game for my daughter. Both started the same way: I gave the agent a prompt, it built something, and the result was fine. Just fine. Functional, forgettable, and looking like every other AI-generated app out there. Then I changed one thing, and the entire quality of the output shifted.

I gave my agents real opinions.

The first version was boring

When I started building the propagation app, I used a product manager agent to define features and a coding agent to implement them. The output worked. It ran. But it looked and felt like a default template with data plugged in. There was nothing distinctive about it. No personality, no point of view.

The problem wasn't the agents' technical ability. They could write code, structure data, and follow instructions. The problem was that nobody in the process had taste. Nobody was saying "no, that violates a principle I care about." The agents were doing what I asked, but they weren't pushing back on whether what I asked was actually good.

Giving agents a real point of view

I started researching designers who had strong, documented philosophies about how things should look and feel. I landed on Dieter Rams's "less but better" approach. I used deep research to pull together everything about their philosophy, their principles, their famous quotes about form and function. Then I built a custom skill (a persistent personality and expertise profile that shapes how an agent thinks and responds) based on all of that.

When I ran the app through this new design agent, the entire application changed. It wasn't just a different color scheme or font choice. The structure of screens changed. The hierarchy of information changed. The agent was making decisions based on a coherent philosophy, not just best practices from a training dataset.

But here's the part that surprised me. The design agent started pushing back on the product manager agent. The product manager would propose a feature, and the design agent would say "no, this violates my principles, and here's why." That tension produced better decisions than either agent made on its own.

The mixture that actually works

That experience led me to build a mixture of experts. Not the machine learning concept with the same name. More like assembling a small team of opinionated specialists who each bring a distinct perspective to the project.

For the propagation app, the team looks like this:

The product manager. This agent is intentionally vanilla. No strong aesthetic opinions. It focuses on market viability, feature prioritization, and whether something will increase the addressable audience. It builds the PRD (the blueprint document that defines what gets built and why) and sizes the opportunity.

The designer. This is the opinionated one. For the propagation app, I eventually switched from Dieter Rams's philosophy to one inspired by Jony Ive, whose work at Apple focused on products that feel warm and human. The app went from austere and functional to something people actually responded to when I showed it around. Higher engagement, more positive reactions, same audience.

The editor and domain expert. This agent knows ham radio. It reviews every piece of text, every label, every tooltip to make sure the words resonate with the target audience. It also weighs in on whether features actually serve what operators need, not just what looks good on a spec sheet.

Three perspectives. When two disagree, the third breaks the tie. There's never a deadlock.

The pipeline in practice

Here's how it actually flows. The product manager creates the PRD. A separate process breaks that PRD into smaller sub-PRDs, each one scoped to a specific piece of the app (a single screen, a feature, a data flow). Then the mixture of experts reviews each sub-PRD. The designer weighs in on experience. The editor checks messaging and domain accuracy. The product manager defends the business case.

Once the experts have shaped each sub-PRD, the system generates individual development tickets (discrete work items a coding agent can pick up and execute). Every agent picking up a ticket has the full context of what the experts decided and why.

From the product manager's first PRD to tickets landing in the queue, the whole process takes about two to three hours. That's not days of meetings and slide decks. That's a few hours of agents arguing with each other and producing better specs because of it.

What I learned from the game

The game for my daughter followed a similar pattern but taught me something different. The first version failed. Not because of bad code, but because the scope kept shifting. My daughter didn't know exactly what she wanted at the start, and as development went on, every conversation changed the direction. The app became impossible to maintain.

For the second attempt, I locked the scope. The product manager reverse-engineered the failed codebase into a proper PRD. I reviewed it, approved it, and told my daughter: this is what gets built. If you have new ideas, write them down and we'll look at them after the first version ships.

She pushed back at first. But once I gave her a shared note to capture ideas for later, she was fine with it. Those notes become formal change requests after delivery, not mid-flight pivots.

For the game's design agent, I went with Toby Fox, the creator of Undertale and Deltarune. He handles the full picture: art, audio, mechanics, theme, and feeling. Not a specialist in one narrow area, but a generalist with strong opinions about all of it. That was key. Too many specialists means too many handoffs and too many gaps between disciplines.

Why this matters if you sell developer tools

Here's the part most people miss. My daughter is not a developer. She can't write code. But she directed an AI agent to build a working game, gave feedback on iterations, and shaped the final product. She was the decision-maker in a software development process without writing a single line.

This is exactly what's happening across the industry right now. Non-technical founders, product managers, and business owners are using AI agents to build, evaluate, and select software. The definition of "developer" is expanding fast. People who never would have touched your API or SDK directly are now discovering and adopting tools through their agents.

If you sell an API, SDK, or developer platform, your buyer persona just got a lot wider. The agent doing the building is choosing your tool (or your competitor's) based on what it knows, what it can find, and how well your product fits into the workflow. The human directing that agent may not know what an SDK is, but they're the one deciding what gets built.

Companies that figure out how to reach this new buyer, not just the traditional developer, are going to capture a distribution channel that barely existed two years ago.

The real takeaway

If you're building anything with AI agents, the biggest upgrade you can make isn't better prompts or faster models. It's giving your agents a point of view worth defending.

Base them on opinionated people. Real designers, real product thinkers, real domain experts. Build their principles into the skill so deeply that the agent will argue with other agents when those principles get violated.

The tension is the feature. When a design agent tells a product manager "no, that's wrong, and here's the principle it violates," that's not a bug in your process. That's the moment where your product gets better.

And if you're running a dev tool company, pay attention to who's making these decisions. It's not just senior engineers anymore. It's anyone with access to an AI agent and a problem to solve. Your go-to-market strategy needs to account for both.

Confidence labels: All claims in this post are based on FACT from Phil's direct experience building these projects. The section on dev tool distribution is INFERRED from these experiences combined with observed industry trends.

Previous
Previous

/insights report

Next
Next

Fresh Eyes: How I Got Past My Agent's Quality Ceiling