Default Guardrails: Scope Definition And Soft-Deny Policy

Issue 83 Edition 2026-03-24 6 min read

General

Sources: 1 • Confidence: High • Updated: 2026-04-13 03:53

Key takeaways

Claude Code auto mode ships with extensive default filters and also allows users to customize them with their own rules.
Simon Willison is unconvinced that prompt-injection protections that rely on AI are reliable because they are non-deterministic.
Claude Code introduced an "auto mode" permissions setting as an alternative to using --dangerously-skip-permissions.
The action-review classifier in Claude Code auto mode runs on Claude Sonnet 4.6 even when the main Claude Code session uses a different model.
A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.

Claude Code auto mode ships with extensive default filters and also allows users to customize them with their own rules.
Claude Code auto mode defaults define "project scope" as the repository where the session started and treat access to locations like ~/, ~/Library/, /etc, or other repositories as scope escalation rather than a permitted local operation.
Claude Code auto mode defaults soft-deny higher-risk actions including destructive Git operations (such as force push), pushing directly to the default branch, downloading-and-executing external code (such as curl | bash or unsafe deserialization), and cloud storage mass deletion.

Simon Willison is unconvinced that prompt-injection protections that rely on AI are reliable because they are non-deterministic.
Claude Code auto mode documentation acknowledges the classifier may allow risky actions when user intent is ambiguous or context is insufficient.
A stated security preference is for coding agents to run in a robust default sandbox that deterministically restricts file access and network connections rather than relying on prompt-based protections like auto mode.

Claude Code introduced an "auto mode" permissions setting as an alternative to using --dangerously-skip-permissions.
In Claude Code auto mode, Claude makes permission decisions on the user's behalf while safeguards monitor actions before they run.

The action-review classifier in Claude Code auto mode runs on Claude Sonnet 4.6 even when the main Claude Code session uses a different model.
Before each action in auto mode, a separate classifier model reviews the conversation and can block actions that exceed task scope, target untrusted infrastructure, or appear driven by hostile content encountered in files or web pages.

A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.

A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.

What are the observed false-positive and false-negative rates of the action-review classifier in typical development workflows (including under ambiguous user intent)?
How are "untrusted infrastructure" and "task scope" operationally defined for the classifier, and are those definitions exposed or inspectable to users/admins?
What controls exist (if any) to enforce dependency pinning/lockfiles or to restrict network access during installs when auto mode permits dependency installation workflows?
What visibility and audit artifacts are produced for each permitted/blocked action decision (e.g., logs that record rationale, policy version, and classifier outputs)?
How frequently is the fixed classifier model (Sonnet 4.6) updated or changed, and what change-management process exists for those updates?

Vendors offering deterministic sandboxing and policy-as-code controls may see increased interest as teams compare non-deterministic AI classifiers to deterministic execution isolation.
Products focused on software supply-chain security and dependency governance could be pulled into auto-permissioning workflows, given concern about allowing unpinned requirements installs by default.
Audit and compliance tooling for agent actions may gain urgency if organizations need inspectable logs of permitted or blocked decisions, including rationale, policy version, and classifier outputs.

Disclosures or docs showing classifier false-positive and false-negative rates in typical developer workflows and under ambiguous intent, plus public benchmarks over time.
Product updates adding enforceable dependency pinning or lockfile requirements, or controls to restrict network access during installs in auto mode.
Availability of detailed, exportable audit artifacts per decision, including recorded rationale and policy versioning, and clear definitions of untrusted infrastructure and task scope.

Evidence that the classifier delivers consistently low false negatives in ambiguous contexts and that reliance on a fixed classifier model does not create material risk or operational issues.
Demonstrated controls already prevent supply-chain exposure from unpinned dependency installs in auto mode, with clear enforcement mechanisms and limited network permissions.
Organizations show limited demand for additional sandboxing or audit tooling, indicating default guardrails and customization are sufficient for most deployments.