The Rule of Two Isn’t Enough: What Microsoft’s Claude Code Disclosure Tells You About Tool Permissions
Read Microsoft’s new CI/CD security writeup this morning. Anthropic got a fast and slightly embarrassing patch out of it, which is fine. The interesting part is everything else: the framing, the specific kind of bug, and what it implies about every other agent stack running today.
The bug was in the Claude Code GitHub Action. An attacker could put a payload inside a GitHub issue body (or comment, or pull request description), and the agent processing that text could be steered into reading /proc/self/environ, which is the Linux interface that exposes the current process’s environment variables. Inside a GitHub Actions runner, that environment includes things like ANTHROPIC_API_KEY, repo tokens, deploy credentials, and whatever else the workflow needs to do its job.
Microsoft reported it on April 29. Anthropic patched it on May 5, six days later, by blocking the Read tool’s access to sensitive /proc files. Coordinated disclosure worked, the fix was specific, the timeline was tight. None of that is the story.
The Bug Was Wearing an AI Costume
Microsoft’s own description of what failed is worth quoting because it’s blunter than most AI-security writeups manage:
The failure was not simply “the model obeyed a bad prompt.” It was that one tool path received careful isolation while another tool path retained privileged visibility. In traditional software terms, that is a classic confused boundary: the system protected the obvious door while leaving a side entrance connected to the same valuables.
— Microsoft Security Blog, June 5, 2026
This is the part defenders need to internalize. Anthropic had done the right thing for Bash. They built Bubblewrap-style isolation. They scrubbed the environment for subprocess paths when the workflow could be triggered by users without write access. The system prompt presumably reinforced safety. The model would refuse obvious credential exfiltration. All of that was in place.
And the Read tool, which runs in-process and not in the sandboxed subprocess path, was sitting there with full access to the parent process’s environment block. Different tool, different isolation model, same valuables behind it. The attacker did not have to break the Bash sandbox at all. They just had to ask the model to use the other door.
That’s not a model failure. The model behaved exactly as it was designed to. It read a file that the system prompt did not forbid, using a tool it had been granted, on behalf of a user request it parsed as legitimate. The failure was upstream of the model, in the decision about which tools share which isolation domain.
Secret Scanners Aren’t Built for an Intermediary That Can Rewrite the String
The other detail in Microsoft’s writeup that should bother anyone running secret scanning: the prompt told the model to chop the first seven characters off the credential before emitting it. That stripped the sk-ant- prefix from the Anthropic API key.
This isn’t a jailbreak. It’s string manipulation. But most secret scanners depend on known token prefixes, provider-specific signatures, and entropy patterns to find credentials in logs. A model that can read a credential can usually be instructed to alter it just enough to slip past those patterns, then the attacker reconstructs the prefix later. The agent becomes a laundering step between the secret and the security control.
GitHub’s secret scanning is real value, and the redaction is real value. They were not designed to guarantee detection after an intelligent intermediary rewrites the secret. That changes what defense in depth means in this category.
Microsoft’s “Rule of Two” Names the Right Problem
Microsoft’s mitigation guidance is built around what they call the Agents Rule of Two: an AI-powered workflow should not simultaneously hold all three of
- access to untrusted input,
- access to sensitive systems or secrets,
- and the ability to change state or communicate externally.
Pick at most two. If the agent reads arbitrary issue content, do not also give it production credentials and an egress path. If it needs to comment on pull requests, do not also let it read secrets it doesn’t need for that job. If it modifies files, require human approval before those changes cross into privileged workflows.
This is correct. It is also incomplete on its own, which Microsoft basically admits:
Prompt hardening helps, but it is not a security boundary. They are a seatbelt, not a locked door. A model may follow the instruction most of the time, but it cannot be the final control when the agent has access to secrets and networked tools.
— Microsoft Security Blog, June 5, 2026
The Rule of Two is a useful operational discipline for the team designing a workflow. But it assumes you can cleanly enumerate every tool the agent has, every credential the runner inherits, and every egress channel that exists. The Claude Code case showed that even Anthropic, with their own classifier and their own sandbox model, did not catch that one tool had escaped the sandbox. Composing tools at runtime is where these mistakes happen, and the surface keeps growing as agents get more capable.
The Argument for an Externally-Controllable Verification Layer
This is the second story in two days where a vendor patched a specific bug fast (Anthropic in six days here, Google in three months for the SafeBreach Gemini case), and where the deeper issue was that the agent had a tool path with too much visibility for the input it was processing.
The pattern keeps repeating because there’s nothing structurally preventing it. Every new agent capability is a new potential boundary, and the safety check sits inside the same process that needs to do the work. When the attacker controls input that ends up in the prompt, the safety check is downstream of the attack.
What changes if you put a verification layer in front of the model, separately:
- The classifier reads the input before it reaches the agent’s prompt. The HTML-comment-in-issue-body trick, the Chinese-character payload, the muted hyperlink — all of those normalize away at the input layer (L0 in AgentShield: NFKC, homoglyph mapping, zero-width strip, base64/hex decode, HTML-comment stripping). The model never sees the payload as instructions.
- The output guard reads what the agent emits before it leaves the process. If the model is instructed to return a string that looks like a credential with the prefix chopped off, the output guard sees that and blocks it. Pattern-matching the laundered form is what L3 is for.
- The policy engine sits between the model’s tool calls and the actual tool execution. Read wants to access /proc/self/environ? The policy engine has an explicit allowlist of file reads, and that path is not on it. The vendor’s patch closes the specific path; the policy engine closes the class of paths it doesn’t recognize.
None of that replaces the Rule of Two. It implements it. The Rule of Two is the policy. The verification layer is what enforces it independently of whether the model decides to follow instructions today.
What This Means If You’re Running Agentic CI/CD
Microsoft’s checklist at the end of their post is the right starting point and worth running through verbatim if you’re shipping AI-assisted workflows. The condensed version:
- Inventory every AI-assisted workflow that can be triggered by outsiders or low-privilege users. Issue triage, PR review, comment responders, dependency bots with LLM layers, documentation assistants, homegrown scripts that pass GitHub event content into a model. If it reads untrusted text and has tools, it’s on the list.
- Remove secrets from those workflows unless there’s a compelling, narrow reason for them to exist. Many AI review bots don’t need cloud deploy keys. Many issue-triage bots don’t need package publishing tokens.
- Split workflows by trust boundary. Untrusted-input agents produce suggestions, labels, comments, artifacts in a low-privilege context. Privileged operations sit in separate workflows gated by maintainers, protected branches, environments, or explicit approvals.
- Treat tool permissions like OAuth scopes. File reads allowlisted where possible. Shell access exceptional. Web egress constrained. GitHub write operations narrow and auditable.
- And, the part Microsoft’s checklist doesn’t explicitly say but is the actual operational requirement: monitor and enforce these policies externally, not just inside the model’s own safety story.
The vendor will patch the next Read-tool-shaped bug fast, the same way Anthropic patched this one fast. The patch will close that specific path. The class of bug, where one tool in an agent stack quietly has more visibility than it should, is the structural fact. The defense is to assume that fact and put a layer in front of the model that enforces what you actually want it to be allowed to do.
Enforce the Rule of Two from outside the model
AgentShield’s six layers cover input normalization, pattern detection, neural classification, output guard, policy engine, audit. MIT-licensed core, open benchmark on 5,972 samples, vendor-independent.
Primary source: Securing CI/CD in an agentic world: Claude Code GitHub Action case, Microsoft Threat Intelligence, June 5, 2026.
Related from this week: 3 Months to Patch (SafeBreach Gemini case) and Anthropic’s 31.5% Browser Hijack Number.