Rosa Del Mar

Daily Brief

Issue 83 2026-03-24

Default Guardrails: Scope Definition And Soft-Deny Policy

Issue 83 Edition 2026-03-24 6 min read
General
Sources: 1 • Confidence: High • Updated: 2026-04-13 03:53

Key takeaways

  • Claude Code auto mode ships with extensive default filters and also allows users to customize them with their own rules.
  • Simon Willison is unconvinced that prompt-injection protections that rely on AI are reliable because they are non-deterministic.
  • Claude Code introduced an "auto mode" permissions setting as an alternative to using --dangerously-skip-permissions.
  • The action-review classifier in Claude Code auto mode runs on Claude Sonnet 4.6 even when the main Claude Code session uses a different model.
  • A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.

Sections

Default Guardrails: Scope Definition And Soft-Deny Policy

  • Claude Code auto mode ships with extensive default filters and also allows users to customize them with their own rules.
  • Claude Code auto mode defaults define "project scope" as the repository where the session started and treat access to locations like ~/, ~/Library/, /etc, or other repositories as scope escalation rather than a permitted local operation.
  • Claude Code auto mode defaults soft-deny higher-risk actions including destructive Git operations (such as force push), pushing directly to the default branch, downloading-and-executing external code (such as curl | bash or unsafe deserialization), and cloud storage mass deletion.

Limitations And Competing Control Philosophy (Ai Classifier Vs Deterministic Sandbox)

  • Simon Willison is unconvinced that prompt-injection protections that rely on AI are reliable because they are non-deterministic.
  • Claude Code auto mode documentation acknowledges the classifier may allow risky actions when user intent is ambiguous or context is insufficient.
  • A stated security preference is for coding agents to run in a robust default sandbox that deterministically restricts file access and network connections rather than relying on prompt-based protections like auto mode.

Auto-Permissioning Mode Introduction

  • Claude Code introduced an "auto mode" permissions setting as an alternative to using --dangerously-skip-permissions.
  • In Claude Code auto mode, Claude makes permission decisions on the user's behalf while safeguards monitor actions before they run.

Classifier-Mediated Action Review And Model Boundary

  • The action-review classifier in Claude Code auto mode runs on Claude Sonnet 4.6 even when the main Claude Code session uses a different model.
  • Before each action in auto mode, a separate classifier model reviews the conversation and can block actions that exceed task scope, target untrusted infrastructure, or appear driven by hostile content encountered in files or web pages.

Supply-Chain Risk Watch Item: Dependency Installation Defaults

  • A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.

Watchlist

  • A raised concern is that allowing "pip install -r requirements.txt" by default may not protect against supply-chain attacks when dependencies are unpinned.

Unknowns

  • What are the observed false-positive and false-negative rates of the action-review classifier in typical development workflows (including under ambiguous user intent)?
  • How are "untrusted infrastructure" and "task scope" operationally defined for the classifier, and are those definitions exposed or inspectable to users/admins?
  • What controls exist (if any) to enforce dependency pinning/lockfiles or to restrict network access during installs when auto mode permits dependency installation workflows?
  • What visibility and audit artifacts are produced for each permitted/blocked action decision (e.g., logs that record rationale, policy version, and classifier outputs)?
  • How frequently is the fixed classifier model (Sonnet 4.6) updated or changed, and what change-management process exists for those updates?

Investor overlay

Read-throughs

  • Vendors offering deterministic sandboxing and policy-as-code controls may see increased interest as teams compare non-deterministic AI classifiers to deterministic execution isolation.
  • Products focused on software supply-chain security and dependency governance could be pulled into auto-permissioning workflows, given concern about allowing unpinned requirements installs by default.
  • Audit and compliance tooling for agent actions may gain urgency if organizations need inspectable logs of permitted or blocked decisions, including rationale, policy version, and classifier outputs.

What would confirm

  • Disclosures or docs showing classifier false-positive and false-negative rates in typical developer workflows and under ambiguous intent, plus public benchmarks over time.
  • Product updates adding enforceable dependency pinning or lockfile requirements, or controls to restrict network access during installs in auto mode.
  • Availability of detailed, exportable audit artifacts per decision, including recorded rationale and policy versioning, and clear definitions of untrusted infrastructure and task scope.

What would kill

  • Evidence that the classifier delivers consistently low false negatives in ambiguous contexts and that reliance on a fixed classifier model does not create material risk or operational issues.
  • Demonstrated controls already prevent supply-chain exposure from unpinned dependency installs in auto mode, with clear enforcement mechanisms and limited network permissions.
  • Organizations show limited demand for additional sandboxing or audit tooling, indicating default guardrails and customization are sufficient for most deployments.

Sources

  1. 2026-03-24 simonwillison.net