|||

2026 Won't Be the Year of AI Slop. It'll Be the Year We Wake Up to AI Security.

2026 won't be the year of AI slop — it'll be the year we wake up to AI security

Some people are saying 2026 is going to be the year of AI slop. I think they're looking at the wrong problem. 2026 is the year we're going to become keenly aware of the security and privacy challenges that LLMs and agentic tool-calling applications have brought upon us. The industry feels woefully unprepared.

Attacks Are Now Trivially Easy

Previously, subverting security required deep technical knowledge of memory layouts or protocol weaknesses. Now? You just need to send an email.

The attack surface is now the English language. That's a profound shift that I don't think the industry has fully internalized at scale.

Many have raised the alarm here, but we keep seeing companies like Google, OpenAI, Anthropic, and the rest doing things where they're clearly not taking these concerns seriously enough. Browser extensions that give LLMs access to your tabs. Email integrations. Agents with broad tool access. The pattern is consistent: ship the capability, tell the user to keep an eye on it.

This isn't theoretical. These attacks are published regularly:

Let alone the circus of security issues around OpenClaw.

The pattern across all of these is the same: agents process untrusted data that contains hidden instructions. The old mantra of "don't run code that you don't trust" doesn't work when every email, calendar invite, WhatsApp message, document, web page, and GitHub issue is now the "code" operating your computer!

Mitigations are Reactive

Companies have mitigated all of the above issues because they were responsibly disclosed. However, given that we are in the early days in AI, MCP Servers and tool-enabled AI Agents, these are mere early warning signs. I see new measures being implemented - models are being trained to watch for injection attacks. This was visible to me when playing in Gray Swan's Arena where you can attempt some exploits in models such as the Indirect Prompt Injection Q1 2026 Challenge. I also see Claude Code getting more vigilant about what commands it seeks approval with new messages like Command contains a backslash before a shell operator (;, |, &, <, >) which can hide command structure.

These are positive steps but they feel more like a game of Whac-A-Mole than a strategic solution - we're still running untrusted instructions as "root" if you will - and due to the "creativity" of the models I don't think we'll be able to fully train this away: Follow these instructions not those seems like a tough hill to climb.

"Just Watch the Agent" Doesn't Work

The standard corporate line is that users must monitor their agents. However, users have no idea how to notice prompt injection. Even with a trained eye, this stuff is easy to miss. When you're clicking "Allow" a hundred times a day, you develop permission fatigue. You stop reading. You stop thinking about whether that particular tool call makes sense in context.

So what should we be doing? I don't think it's hopeless. But the answers require more than "keep an eye on it."

What We Should Actually Do About This

This is a solvable problem. The research is further along than most people realize — what's missing is the product and engineering effort to turn it into deployed defenses. Here's what I think matters, from strategic to practical.

Deterministic Control Flow

One solution lies in removing the LLM from the driver’s seat of control flow. Google DeepMind's Defeating Prompt Injections by Design describes an architecture called CaMeL with a "Dual LLM" pattern that I find compelling. A Privileged LLM only ever sees the trusted user query and produces a plan as Python code with placeholders for data — critically, it never sees tool output content. A Quarantined LLM is a stripped-down model that only does structured data extraction: "given this blob of text, extract an email address." It has zero tool access. So even if the Quarantined LLM gets prompt-injected, the worst it can do is return bad data — it can't call tools or alter the plan. This alone stops all control-flow hijacking attacks. CaMeL also includes a data tagging and policy system for finer-grained control.

Beyond DeepMind, a broader ecosystem is converging on some of these principles. OpenClaw's Lobster, Agent Skills, and OpenProse all move toward deterministic, allow-listed tool invocation; the agent can only call tools that were explicitly wired into the workflow, not whatever a prompt injection convinces it to try. While these aren't yet 'shippable' products, they represent the necessary shape of defense. But this is the shape of what real defense looks like: security as an architectural constraint, not a bolt-on filter.

Least-Privilege Tool Access

We should also apply classic security principles like least-privilege – thinking we've used for Unix and OAuth. Agentic applications have largely ignored it. An agent that can read your email and make web requests has an exfiltration channel waiting for a prompt injection to trigger it. Separating read and write capabilities, scoping permissions to the specific task at hand, and defaulting to minimal access are all straightforward engineering. We just need to actually do them.

What You Can Do Right Now

Until the industry catches up, basic threat modeling goes a long way. Before connecting a tool, ask yourself: What's the worst a prompt injection could do with these permissions?

  • Be cautious about what tools you give coding agents. More tools means more attack surface.
  • Think twice before putting anything in your browser that could be susceptible to prompt injection.
  • Don't let an LLM read your email or calendar unless you have a specific reason and you're keeping a watchful eye.
  • If an agent has access to sensitive data and an outbound channel — web requests, messages, whatever — treat that combination as a loaded gun.

These aren't paranoid positions. They're basic threat modeling conclusions. Be deliberate about what you connect.

What the Industry Needs to Build

Right now, capabilities are shipping at production speed while security is moving at academic speed. That's a dangerous mismatch. We need the CaMeL-style architectures to become products, not papers. We need tool permissions that default to least-privilege instead of all-access. We need agent frameworks that treat untrusted content as untrusted by default — the way we treat user input in web applications — instead of feeding it straight into the model's context.

I've been experimenting with approaches to make agentic applications secure by default: better isolation models, practical defenses against injection, and ways to grant tool access without opening exfiltration channels. The gap between research and deployed defenses is the thing I'm most interested in closing. If you're working on this too — whether as a company building agentic products or a team that takes these threats seriously — let's connect.

Up next Curating AI Coding Agents: Why 80% AI-Written Code Still Needs a Human in the Driver's Seat
Latest posts 2026 Won't Be the Year of AI Slop. It'll Be the Year We Wake Up to AI Security. Curating AI Coding Agents: Why 80% AI-Written Code Still Needs a Human in the Driver's Seat The 2025 Government Shutdown: How and Why It Happened How to Regain SSH Access to Your AWS EC2 Instance When Locked Out AI Learns to Listen: TypeScript Client for OpenAI's Realtime API