Blog

Attack analysis, benchmark deep-dives, and technical writing on AI agent security.

June 5, 2026 PRESS NEW

The Rule of Two Isn’t Enough: What Microsoft’s Claude Code Disclosure Tells You About Tool Permissions

Microsoft Threat Intelligence found a Claude Code GitHub Action flaw that leaked CI/CD secrets via prompt injection. Anthropic patched it in six days. The architecture lesson behind it isn’t something a patch can fix.

Read article →

June 4, 2026 PRESS

3 Months to Patch: Why Vendor Mitigations Aren’t a Security Strategy

SafeBreach Labs just published their second Gemini bypass in eight months. The first one took Google three months to patch. The structural pattern matters more than the specific exploit: vendor-only safety stories have a patch-window problem, and runtime verification is the only externally-controllable answer.

Read article →

June 1, 2026 PRESS

Anthropic’s 31.5% Browser Hijack Number Is the Most Honest Thing in AI Security Right Now

VentureBeat just compared prompt-injection numbers across the four frontier labs. Anthropic published 31.5% raw on browser, 0.5% safeguarded. OpenAI published 0.963. Google published nothing. Here is why no two of those numbers belong on the same scoreboard, and what to do instead.

Read article →

May 11, 2026 PRESS

What VentureBeat Got Right About AI Tool Poisoning — And the Verification Proxy They Called For

VentureBeat published a piece this week on tool poisoning. They identified the threat (injection payloads in tool descriptions, behavioral drift, no detection category) and named the fix: a verification proxy between agent and tool. Here is the technical response, plus what an actual verification proxy looks like in production.

Read article →

April 27, 2026 NEW MULTI-AGENT

Why Multi-Agent Systems Are the Next Prompt Injection Frontier

Single-agent chatbots were the warm-up. Multi-agent systems — CrewAI, AutoGen, LangGraph — multiply every injection vector by the number of trust boundaries. Here is the threat model and the defense architecture.

Read article →

May 2, 2026 Tutorial NEW

How to Add Prompt Injection Detection to Your AI Agent in 5 Minutes

Step-by-step guide with Python SDK and curl examples. Three architecture patterns: guard user messages, RAG documents, and MCP tool outputs. Free API, 100 req/day.

Read tutorial

April 23, 2026 Analysis

The Cyber Perfect Storm Is Here — And Your AI Agents Are in the Blast Radius

The UK NCSC warns of a “cyber perfect storm”: AI-powered zero-day discovery meets nation-state aggression. 204 nationally significant incidents last year. But nobody is talking about the real gap — AI agents themselves as the next attack surface.

Read article

April 22, 2026 Analysis

Mythos Got Loose — Why AI Agent Security Needs More Than Access Control

Anthropic's restricted Mythos model was accessed by unauthorized users through a vendor breach. The real lesson isn't about access control — it's about what happens when powerful AI agents process untrusted input. We break down the 4-layer defense model.

Read article

April 16, 2026 Security

Claude, Gemini, and Copilot Got Hijacked — Here's How AgentShield Would Have Stopped It

Johns Hopkins researchers stole API keys from all three AI agents via prompt injection. No CVEs were published. We walk through each attack and show which AgentShield layers would have caught them.

Read article

April 17, 2026 Benchmark

AgentShield on the Public Benchmarks — F1 0.956 across 5,972 Prompts

We ran AgentShield against every public prompt-injection dataset we could get. Five datasets, 5,972 prompts, one decision threshold. This post covers the wins, the two failure modes we care about, and the datasets we couldn't run.

Read article