Agent Permission Delegation With Pre-Execution Safety Gate
Sources: 1 • Confidence: High • Updated: 2026-03-25 17:55
Key takeaways
- The action-review classifier runs on Claude Sonnet 4.6 even when the main Claude Code session uses a different model.
- Claude Code ships extensive default auto-mode filters and allows users to customize them with their own rules.
- A commentator argues that prompt-injection protections that rely on AI are not reliable because they are non-deterministic.
- Allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.
- Claude Code documentation acknowledges the classifier may allow risky actions when user intent is ambiguous or context is insufficient.
Sections
Agent Permission Delegation With Pre-Execution Safety Gate
- The action-review classifier runs on Claude Sonnet 4.6 even when the main Claude Code session uses a different model.
- Claude Code introduced an "auto mode" permissions setting as an alternative to using --dangerously-skip-permissions.
- In auto mode, Claude makes permission decisions on the user's behalf while safeguards monitor actions before they run.
- A separate classifier model reviews the conversation before each action and blocks actions that exceed task scope, target untrusted infrastructure, or appear driven by hostile content encountered in files or web pages.
Default Policy Boundaries: Repo-Scoped Access And Soft-Denied Risky Actions
- Claude Code ships extensive default auto-mode filters and allows users to customize them with their own rules.
- Auto-mode defaults define "project scope" as the repository where the session started and treat access to locations like ~/, ~/Library/, /etc, or other repositories as a scope escalation.
- Auto-mode defaults soft-deny higher-risk actions including destructive Git operations (e.g., force push), pushing directly to the default branch, downloading-and-executing external code (e.g., curl | bash or unsafe deserialization), and cloud storage mass deletion.
Limitations And Contested Reliability Of Model-Based Prompt-Injection Defenses
- A commentator argues that prompt-injection protections that rely on AI are not reliable because they are non-deterministic.
- Claude Code documentation acknowledges the classifier may allow risky actions when user intent is ambiguous or context is insufficient.
- A commentator prefers coding agents to run in a robust default sandbox that deterministically restricts file access and network connections rather than relying on prompt-based protections like auto mode.
Supply-Chain Risk Surface In Default-Allowed Dependency Installation
- Allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.
Watchlist
- Allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.
Unknowns
- How often does the action-review classifier incorrectly allow risky actions (false negatives) in real-world use, especially under ambiguous instructions or partial context?
- What is the false-positive rate (unnecessary blocks/soft-denies) for common developer workflows, and what are the highest-friction categories?
- What are the exact semantics and expressiveness of user-defined rules (what can be constrained, how rules are evaluated, and whether rules are auditable/versioned)?
- How does the system define and manage "trusted" versus "untrusted" infrastructure targets, and can this be configured per organization?
- Does allowing dependency installation by default lead to measurable supply-chain exposure when requirements are unpinned, and are there planned or existing mitigations (e.g., requiring lockfiles)?