Rosa Del Mar

Daily Brief

Issue 78 2026-03-19

Government Contracting Conflict: Surveillance And All-Lawful-Use Terms

Issue 78 Edition 2026-03-19 8 min read
General
Sources: 1 • Confidence: Medium • Updated: 2026-03-25 17:58

Key takeaways

  • The government has pressured companies not to do business with Anthropic.
  • Distillation can extract high-value behavioral and reasoning traces from a frontier model via targeted interaction, making it a qualitatively different and more efficient data source than reconstructing the original pretraining corpus.
  • Gemini is not broadly comparable to Claude and GPT for Zvi Mowshowitz's usage and adds little value beyond Flash.
  • Assuming "no one would implement" unsafe interpretability-training approaches is unreliable because someone likely will.
  • A warning sign for approaching endgame dynamics is a collapse in time-to-next-meaningful-model-release from months to weeks to days.

Sections

Government Contracting Conflict: Surveillance And All-Lawful-Use Terms

  • The government has pressured companies not to do business with Anthropic.
  • The primary red line in the Anthropic–Department of Defense dispute was domestic mass surveillance rather than autonomous weapons.
  • A disputed contract point in the Anthropic–Department of Defense conflict centered on a demand for "all lawful use" terms, viewed by Anthropic as enabling domestic mass surveillance.
  • Anthropic revenue grew from roughly $100M to $1B to $9B annual recurring revenue, and from $9B to $19B since the start of the year prior to the dispute.
  • A surveillance mechanism enabled by AI is analysis and de-anonymization across vast commercial and classified datasets, enabling detailed inference about individuals' lives at scale.
  • The Department of Defense issued an official memo stating it must push forward with AI even if it is not aligned.

Frontier Competition And The Importance Of Scaffolding

  • Distillation can extract high-value behavioral and reasoning traces from a frontier model via targeted interaction, making it a qualitatively different and more efficient data source than reconstructing the original pretraining corpus.
  • Large compute spend alone is not sufficient to keep up at the frontier in the current regime because execution and talent still matter, with xAI and Meta cited as lagging examples.
  • Google is the top-tier lab most at risk of dropping out of the top frontier tier.
  • In recursive self-improvement scenarios, progress becomes roughly proportional to compute allocation because human research talent stops being the binding constraint.
  • Recursive self-improvement via strong coding agents and scaffolding (such as Claude Code and Codex) is becoming decisive for competitiveness.
  • The set of frontier "live players" has narrowed to three labs: Anthropic, OpenAI, and Google.

Google Model Usability Versus Benchmarks (Disagreement And Adoption Risk)

  • Gemini is not broadly comparable to Claude and GPT for Zvi Mowshowitz's usage and adds little value beyond Flash.
  • Gemini, Claude, and GPT are broadly comparable for Nathan Labenz's practical decision-support tasks.
  • Gemini Flash is the best very-fast model for direct Q&A where it already knows the answer.
  • Gemini tends to struggle when challenged and can respond with low-value verbose output.
  • Google's Gemini models have problematic behavior and poor user experience despite benchmark strength, and Google lacks strong scaffolding efforts for agentic workflows.
  • Repeated product disappointments can compound Google's ecosystem loss by reducing willingness to retry, including in coding-agent stacks.

Interpretability-In-Training: Evasion Risk And Governance Boundary Dispute

  • Assuming "no one would implement" unsafe interpretability-training approaches is unreliable because someone likely will.
  • Naively backpropagating through interpretability probes can train models to evade detectors, and Anthropic is aware of this risk.
  • Anthropic has a proof-of-concept method where a hallucination detector is run on a frozen copy of the model to generate a penalty signal used to train the live model, aiming to reduce detector-evasion incentives.
  • Publishing interpretability-guided training techniques on small models will predictably lead others to apply them to larger frontier models where they become dangerous.
  • Interpretability signals should not be used as an input to the training process, even if intentional shaping of behavior is acceptable.
  • It is too coarse to categorically reject interpretability-in-training; given time constraints, partial tools may need cautious use while avoiding obviously harmful implementations.

Phase-Transition Criteria For Endgame Dynamics

  • A warning sign for approaching endgame dynamics is a collapse in time-to-next-meaningful-model-release from months to weeks to days.
  • Another outside-the-lab warning sign for endgame dynamics is overt real-world transformation such as robot factory buildouts, large-scale physical projects, mass job disruption, and major government interventions becoming central public issues.
  • A practical criterion for the AI "endgame" is when AIs, rather than humans, are largely driving further AI development.
  • The current period is the beginning of the "middle game" because AI-driven iteration is accelerating but humans remain meaningfully in control and the world is not yet transforming in unmistakable ways.
  • An additional endgame criterion is when the human operator becomes largely interchangeable and top-tier human skill is no longer required for high performance.

Watchlist

  • A warning sign for approaching endgame dynamics is a collapse in time-to-next-meaningful-model-release from months to weeks to days.
  • Another outside-the-lab warning sign for endgame dynamics is overt real-world transformation such as robot factory buildouts, large-scale physical projects, mass job disruption, and major government interventions becoming central public issues.
  • The government has pressured companies not to do business with Anthropic.
  • Assuming "no one would implement" unsafe interpretability-training approaches is unreliable because someone likely will.
  • Zvi frames DeepSeek V4 as a decisive upcoming test of whether leading Chinese labs can compete in the top frontier league and notes it appears delayed relative to expectations.
  • Zvi claims the government's response escalated broadly instead of simply ending the disputed work, which he suggests indicates motives such as retaliation or negotiation leverage.

Unknowns

  • Do official government documents substantiate the claim that the Department of Defense issued a memo stating it must push forward with AI even if it is not aligned?
  • What is the verifiable source for the claimed Anthropic ARR trajectory and the timing relative to the dispute?
  • What specific contract language, procurement templates, or RFP boilerplate exists regarding "all lawful use" terms across government AI contracts?
  • Is there documentary evidence (litigation filings, declarations, procurement records) that the government pressured third parties not to do business with Anthropic or interfered with customer contracts?
  • How should "middle game" versus "endgame" be operationalized in measurable organizational terms (share of autonomous agent contribution, time-to-iterate, reduced need for elite operators) across labs?

Investor overlay

Read-throughs

  • Government AI procurement may standardize broader all lawful use terms, raising compliance, reputational, and customer concentration risk for frontier model vendors and their partners.
  • Distillation and targeted interaction as efficient capability extraction could increase the value of closed deployment modes and monitoring, affecting platform strategy and partner access decisions.
  • If time between meaningful model releases collapses, compute, tooling, and deployment vendors may face faster demand cycles and higher volatility in adoption as customers chase the latest capability.

What would confirm

  • Public procurement templates, RFP boilerplate, or contract language showing all lawful use provisions becoming common across government AI contracts.
  • Verifiable records showing government pressure on third parties not to do business with a named vendor, such as procurement records, litigation filings, or declarations.
  • Observable shift to weeks or days between meaningful frontier releases, alongside visible real economy transformation like major government interventions or mass job disruption becoming central public issues.

What would kill

  • Document review shows no meaningful prevalence of all lawful use terms in government AI contracts or clear carve outs that limit surveillance related applications.
  • No documentary evidence emerges supporting claims of third party pressure tactics affecting customer contracts, and counterparties publicly deny interference with specifics.
  • Release cadence remains in the months range and outside the lab indicators like large scale physical buildouts or major interventions do not materialize as central issues.

Sources