|||

Curating AI Coding Agents: Why 80% AI-Written Code Still Needs a Human in the Driver's Seat

This has little to do with AI coding agents, but it's a cute little "agent" I created a while back that ran around the house autonomously

I agree with Andrej Karpathy's point that over the last three to five months we've gone from AI coding agents writing maybe 20% of our code to something more like 80%. That shift is real and it's dramatic. But here's what I keep coming back to: these agents can create a huge mess in a codebase if left unsupervised.

They make poor design choices. They make weird assumptions about requirements. They leave dead code everywhere. They duplicate code liberally. They reach for cheap shortcuts like unnecessary type casting just because it's quick. Very much like an early-career developer who hasn't yet had the experience of coming back months later to maintain the code they wrote.

This isn't a criticism. AI coding agents are amazing. But they're coding agents. They don't know what we want and they don't know how we want to operate our systems. So curating them is critical.

My own workflow has evolved to manage this. It's a mix of planning, disciplined review, and a few clever configurations. If you adopt some of these and you find yourself shipping faster, let me know!

Planning Mode is Where the Real Work Happens

With Claude Code, I spend far more time in planning mode than not. I go back and forth. I ask questions. I tell it to search the web and research topics. All of this helps me clarify my own thinking and leads to much better outcomes. The planning phase is where you set the direction. Skip it and you'll spend twice as long cleaning up.

Save Your Plans to the Repo

After playing with tools like Spec Kit to do more detailed planning — more on that tool another day — I started saving plans and implementation summaries directly to my repos. Even when I'm just using Claude Code's built-in planning mode, I save the output. This turned out to be more useful than I expected. I go back and review them. The agents go back and review them. I've started pointing the agent at a plan or summary for a feature and asking it to review why we made a certain decision, or to make sure a change I'm about to make won't conflict with something we decided earlier.

This became important enough that I added it to my user-scoped Claude Code memory (I share mine at the end of this article). Which had its own nice side effect: now Claude keeps those plans up to date when I go back and fix a bug in a feature or keep iterating on it after the initial commit. That was a real delighter — I didn't ask for it, but once the instruction was in memory, the agent just started doing it.

Code Review is Non-Negotiable

I don't mind the agent writing large amounts of code while I'm away. But I always review it. I spend real time going through what it produced, telling it things to clean up, asking questions about choices it made.

Here's the uncomfortable math though. I just finished reviewing a PR on a pretty complex system with a lot of history. It took me 45 minutes to do the review — and I ended up rejecting it for a serious issue that was subtle enough that only a seasoned engineer with enough experience to know where to look would catch it. That review probably took me longer than it took the engineer armed with an AI Coding Agent to produce the code. On one hand, it seems so powerful that an AI agent can produce changes in a complex system so quickly. But if it takes a more senior engineer even longer to review it than it took to write it, that's a real problem we need to think about. The cost didn't disappear, it shifted to the reviewer.

Which begs the question: by empowering less experienced engineers with AI agents, are we actually taking more time away from our most experienced, highest-performing engineers?

Security: These Agents Are Not to Be Trusted

I want to be blunt about this: these agents are not to be trusted. I've seen them add --no-verify to git commits to skip commit signing. Done with the best of intentions — the agent just wanted to get past an obstacle and keep moving. But it's truly terrifying to think about what these agents might do when you're not paying attention.

Keep an eye on them. In Claude Code, avoid the --dangerously-skip-permissions flag unless you have other precautions in place that give you confidence you can trust what it's doing. And speaking of trust — in the Gemini CLI, I'm genuinely shocked that Google puts it in permissive mode by default and gives you a pretty poor UX to change that default. That's a wild choice for a tool that can run arbitrary commands on your machine. We're told not to run even simple scripts as root, but here we are running non-deterministic programs that can be manipulated with plain English and giving them full access to our machines?

The permission prompts are absolutely necessary but definitely lead to some level of fatigue and tempt us to just allow everything or grant permission without looking at the command carefully. One thing I want to explore soon is teaching Claude Code to have a deny list for certain commands. For one example, I'm getting more comfortable with it doing its own commits, but I'm very uncomfortable with it pushing. I'd like to give it broad Git permissions but never allow git push. Claude Code's hooks might allow me to do this. I just haven't tried it yet...

The agents always seem to try to find the path of least resistance. Sometimes that path goes right through your security guardrails. Treat them like you'd treat a new contractor with sudo access: verify everything.

Testing as a Discipline

I tell Claude "TDD bro" constantly — which, funny enough, it correctly understands as test-driven development. Write a test that reproduces the bug first, fix it, then verify the fix by running the tests again. I'm a bigger fan than ever of curating a solid set of end-to-end tests. And I probably have 10-20x more unit tests than end-to-end tests.

But I also don't let agents go nuts on testing. I often tell them to simplify tests, remove some, or only test critical paths. Left unchecked, an agent will write exhaustive tests and then end up in a loop playing whack-a-mole with obscure test cases that were written weeks ago that I don't even care about. Keep an eye on this.

Worktrees for Parallel Work

Over the past month, git worktrees have become part of my normal workflow. I can keep Claude busy on a surprising number of features at the same time. I let it write a lot of code while I go off and work on something else, then come back and review.

This doesn't always mean I'm literally working on multiple features at once, though that does happen. More often it's that one feature is out for code review, or on hold for some other reason, and I've shifted to a different one. Worktrees just make juggling all of that much easier than branch switching ever did.

CI/CD Monitoring with the GitHub CLI

Another thing I've found valuable is making sure the agent uses the gh CLI to monitor workflow runs. I have it watch deployments and check that CI/CD pipelines complete successfully. I'm a big believer in CI/CD and have some pretty sophisticated pipelines that will often catch things not found in local dev. Having Claude monitor those while I go do something else saves me time. A lot of times when something goes wrong, it sees it, starts fixing it, and has a commit ready by the time I come back.

Linters for Agent Management

I've been a fan of ESLint rules and linters for a long time, but they've been surprisingly helpful for managing agents specifically. These agents produce loads of code, and having a strict ruleset that I deeply understand keeps them on track.

I publish my TypeScript/JavaScript lint rules at activescott/eslint-config and I've been tightening them up in individual projects specifically to keep the agents in line. I need to go back and tighten up my baseline config there too. Over time I'll curate my rules even more aggressively to deny more of the wacky coding practices I see agents reach for. The good news? The agent never complains. Throw the most strict ruleset you can imagine at a team of developers and you might cause a protest. Put it on an AI coding agent and you won't hear a word. They'll see the violations and fix them. Or they might ignore them, but that goes back to doing your own review.

The Agents Are Changing Fast

Maybe a year ago, maybe only six months ago (it's hard to tell these days), I was thinking nobody was going to catch up to Cursor. Then a few months ago I realized I wasn't using Cursor anymore and uninstalled it. I still use multiple agents, but Claude Code is the one I use the most now. Getting too committed to any single agent right now is futile. The space is moving too fast.

The Secret Weapon: Your User-Scoped Agent Memory

Remember when I mentioned adding things to my user-scoped Claude Code memory? This has quietly become the most powerful lever I have.

Most of these agents support some form of persistent memory or instructions — a file that the agent reads at the start of every session. Claude Code calls it CLAUDE.md. I have fairly elaborate ones within individual repos covering project-specific architecture, patterns, and conventions. But I've also started curating a user-scoped version: the best of the best instructions that are reasonably project-agnostic.

Things like "never skip commit signing," "save plans to the repo," "prefer jq over python for parsing JSON on the command line." Small, opinionated rules that keep the agent aligned with how I work, across every project.

My user-scoped CLAUDE.md lives in my dotfiles now, right alongside my shell config and git aliases. And like all my dotfiles, it's published on GitHub. Take a look if you're curious — or better yet, start curating your own.

Up next The 2025 Government Shutdown: How and Why It Happened
Latest posts Curating AI Coding Agents: Why 80% AI-Written Code Still Needs a Human in the Driver's Seat The 2025 Government Shutdown: How and Why It Happened How to Regain SSH Access to Your AWS EC2 Instance When Locked Out AI Learns to Listen: TypeScript Client for OpenAI's Realtime API Bridging the Gap: Communicating Between the "Browser" (Renderer) and the Main Process in an Electron App