Default Guardrails: Scope Definition And Soft-Deny Policy
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:53
Key takeaways
- Claude Code auto mode ships with extensive default filters and also allows users to customize them with their own rules.
- Simon Willison is unconvinced that prompt-injection protections that rely on AI are reliable because they are non-deterministic.
- Claude Code introduced an "auto mode" permissions setting as an alternative to using --dangerously-skip-permissions.
- The action-review classifier in Claude Code auto mode runs on Claude Sonnet 4.6 even when the main Claude Code session uses a different model.
- A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.
Sections
Default Guardrails: Scope Definition And Soft-Deny Policy
- Claude Code auto mode ships with extensive default filters and also allows users to customize them with their own rules.
- Claude Code auto mode defaults define "project scope" as the repository where the session started and treat access to locations like ~/, ~/Library/, /etc, or other repositories as scope escalation rather than a permitted local operation.
- Claude Code auto mode defaults soft-deny higher-risk actions including destructive Git operations (such as force push), pushing directly to the default branch, downloading-and-executing external code (such as curl | bash or unsafe deserialization), and cloud storage mass deletion.
Limitations And Competing Control Philosophy (Ai Classifier Vs Deterministic Sandbox)
- Simon Willison is unconvinced that prompt-injection protections that rely on AI are reliable because they are non-deterministic.
- Claude Code auto mode documentation acknowledges the classifier may allow risky actions when user intent is ambiguous or context is insufficient.
- A stated security preference is for coding agents to run in a robust default sandbox that deterministically restricts file access and network connections rather than relying on prompt-based protections like auto mode.
Auto-Permissioning Mode Introduction
- Claude Code introduced an "auto mode" permissions setting as an alternative to using --dangerously-skip-permissions.
- In Claude Code auto mode, Claude makes permission decisions on the user's behalf while safeguards monitor actions before they run.
Classifier-Mediated Action Review And Model Boundary
- The action-review classifier in Claude Code auto mode runs on Claude Sonnet 4.6 even when the main Claude Code session uses a different model.
- Before each action in auto mode, a separate classifier model reviews the conversation and can block actions that exceed task scope, target untrusted infrastructure, or appear driven by hostile content encountered in files or web pages.
Supply-Chain Risk Watch Item: Dependency Installation Defaults
- A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.
Watchlist
- A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.
Unknowns
- What are the observed false-positive and false-negative rates of the action-review classifier in typical development workflows (including under ambiguous user intent)?
- How are "untrusted infrastructure" and "task scope" operationally defined for the classifier, and are those definitions exposed or inspectable to users/admins?
- What controls exist (if any) to enforce dependency pinning/lockfiles or to restrict network access during installs when auto mode permits dependency installation workflows?
- What visibility and audit artifacts are produced for each permitted/blocked action decision (e.g., logs that record rationale, policy version, and classifier outputs)?
- How frequently is the fixed classifier model (Sonnet 4.6) updated or changed, and what change-management process exists for those updates?