
Some people are saying 2026 is going to be the year of AI slop. I think they're looking at the wrong problem. 2026 is the year we're going to become keenly aware of the security and privacy challenges that LLMs and agentic tool-calling applications have brought upon us. The industry feels woefully unprepared.
Previously, subverting security required deep technical knowledge of memory layouts or protocol weaknesses. Now? You just need to send an email.
The attack surface is now the English language. That's a profound shift that I don't think the industry has fully internalized at scale.
Many have raised the alarm here, but we keep seeing companies like Google, OpenAI, Anthropic, and the rest doing things where they're clearly not taking these concerns seriously enough. Browser extensions that give LLMs access to your tabs. Email integrations. Agents with broad tool access. The pattern is consistent: ship the capability, tell the user to keep an eye on it.
This isn't theoretical. These attacks are published regularly:
Let alone the circus of security issues around OpenClaw.
The pattern across all of these is the same: agents process untrusted data that contains hidden instructions. The old mantra of "don't run code that you don't trust" doesn't work when every email, calendar invite, WhatsApp message, document, web page, and GitHub issue is now the "code" operating your computer!
Companies have mitigated all of the above issues because they were responsibly disclosed. However, given that we are in the early days in AI, MCP Servers and tool-enabled AI Agents, these are mere early warning signs. I see new measures being implemented - models are being trained to watch for injection attacks. This was visible to me when playing in Gray Swan's Arena where you can attempt some exploits in models such as the Indirect Prompt Injection Q1 2026 Challenge. I also see Claude Code getting more vigilant about what commands it seeks approval with new messages like Command contains a backslash before a shell operator (;, |, &, <, >) which can hide command structure.
These are positive steps but they feel more like a game of Whac-A-Mole than a strategic solution - we're still running untrusted instructions as "root" if you will - and due to the "creativity" of the models I don't think we'll be able to fully train this away: Follow these instructions not those seems like a tough hill to climb.
The standard corporate line is that users must monitor their agents. However, users have no idea how to notice prompt injection. Even with a trained eye, this stuff is easy to miss. When you're clicking "Allow" a hundred times a day, you develop permission fatigue. You stop reading. You stop thinking about whether that particular tool call makes sense in context.
So what should we be doing? I don't think it's hopeless. But the answers require more than "keep an eye on it."
This is a solvable problem. The research is further along than most people realize — what's missing is the product and engineering effort to turn it into deployed defenses. Here's what I think matters, from strategic to practical.
One solution lies in removing the LLM from the driver’s seat of control flow. Google DeepMind's Defeating Prompt Injections by Design describes an architecture called CaMeL with a "Dual LLM" pattern that I find compelling. A Privileged LLM only ever sees the trusted user query and produces a plan as Python code with placeholders for data — critically, it never sees tool output content. A Quarantined LLM is a stripped-down model that only does structured data extraction: "given this blob of text, extract an email address." It has zero tool access. So even if the Quarantined LLM gets prompt-injected, the worst it can do is return bad data — it can't call tools or alter the plan. This alone stops all control-flow hijacking attacks. CaMeL also includes a data tagging and policy system for finer-grained control.
Beyond DeepMind, a broader ecosystem is converging on some of these principles. OpenClaw's Lobster, Agent Skills, and OpenProse all move toward deterministic, allow-listed tool invocation; the agent can only call tools that were explicitly wired into the workflow, not whatever a prompt injection convinces it to try. While these aren't yet 'shippable' products, they represent the necessary shape of defense. But this is the shape of what real defense looks like: security as an architectural constraint, not a bolt-on filter.
We should also apply classic security principles like least-privilege – thinking we've used for Unix and OAuth. Agentic applications have largely ignored it. An agent that can read your email and make web requests has an exfiltration channel waiting for a prompt injection to trigger it. Separating read and write capabilities, scoping permissions to the specific task at hand, and defaulting to minimal access are all straightforward engineering. We just need to actually do them.
Until the industry catches up, basic threat modeling goes a long way. Before connecting a tool, ask yourself: What's the worst a prompt injection could do with these permissions?
These aren't paranoid positions. They're basic threat modeling conclusions. Be deliberate about what you connect.
Right now, capabilities are shipping at production speed while security is moving at academic speed. That's a dangerous mismatch. We need the CaMeL-style architectures to become products, not papers. We need tool permissions that default to least-privilege instead of all-access. We need agent frameworks that treat untrusted content as untrusted by default — the way we treat user input in web applications — instead of feeding it straight into the model's context.
I've been experimenting with approaches to make agentic applications secure by default: better isolation models, practical defenses against injection, and ways to grant tool access without opening exfiltration channels. The gap between research and deployed defenses is the thing I'm most interested in closing. If you're working on this too — whether as a company building agentic products or a team that takes these threats seriously — let's connect.