Compute Scarcity, Throttling, And The Economics Of Capacity

Issue 93 Edition 2026-04-03 9 min read

General

Sources: 1 • Confidence: Medium • Updated: 2026-04-04 03:51

Key takeaways

Marc Andreessen claims some users are spending on the order of $1,000 per day on Claude tokens to run agent-like workloads.
Marc Andreessen defines an agent as an LLM connected to a bash-like shell plus a filesystem for state, using markdown files and a cron-like loop or heartbeat.
Marc Andreessen claims open source and edge inference become more important when centralized inference is capacity-constrained and when users want trust, privacy, latency, and price optimization from local models.
Marc Andreessen claims bots and cheap explosive drones create economic asymmetries where attacks are cheap but defense and verification are expensive, requiring new defensive technologies and approaches.
An unnamed speaker suggests model-provider lock-in via proprietary internal representations may be limited because a competing model could learn or reverse-engineer what another model produced.

Sections

Compute Scarcity, Throttling, And The Economics Of Capacity

Marc Andreessen claims some users are spending on the order of $1,000 per day on Claude tokens to run agent-like workloads.
Marc Andreessen claims that currently each incremental dollar spent to deploy running GPUs is converting into revenue quickly because compute capacity is scarce.
Marc Andreessen claims current user-facing models are 'sandbag' versions due to supply constraints, implying delivered capability is being throttled by compute availability.
Marc Andreessen claims software improvements are increasing the profitability and effective value of older inference chips, which he describes as historically unusual.
Marc Andreessen predicts chronic AI supply shortages for roughly the next three to four years with the broader supply chain largely sold out.
Marc Andreessen predicts that once AI supply constraints ease, industry growth will accelerate because products improve and costs fall.

Agent Architecture Thesis: Unix-Like Primitives, Portability, And Self-Extension

Marc Andreessen defines an agent as an LLM connected to a bash-like shell plus a filesystem for state, using markdown files and a cron-like loop or heartbeat.
Marc Andreessen argues that specialized tool-connection protocols are not necessary because exposing capabilities via command-line interfaces is sufficient.
Marc Andreessen argues that a pragmatic approach for new application waves is to liberate and extend the latent power of existing systems rather than reinventing the stack.
Marc Andreessen claims Pi plus OpenClaw represent an architectural breakthrough by combining the LLM paradigm with the Unix shell paradigm for building agents.
Marc Andreessen claims that storing agent state in files enables swapping the underlying LLM while preserving memories and capabilities, making agents more portable than any single model.
Marc Andreessen claims that because an agent can introspect and rewrite its own files, it can add new functions to itself with minimal human effort, enabling self-extension as a routine workflow.

Open-Source And Edge Inference As A Response To Scarcity And Trust Requirements

Marc Andreessen claims open source and edge inference become more important when centralized inference is capacity-constrained and when users want trust, privacy, latency, and price optimization from local models.
Marc Andreessen claims open sourcing accelerates industry progress not only by distributing software but by revealing implementation details (papers and code) that enable rapid replication of breakthroughs like reasoning.
Marc Andreessen states that he is on PCAST and believes the current U.S. administration is supportive of AI and open-source AI, contrasting it with a prior administration he says was hostile.
Marc Andreessen argues Chinese AI firms may open-source models as a loss leader because they expect limited ability to sell commercial AI services in the U.S.
Shawn Wang claims that AI2 (the Allen Institute) collapsed and expresses pessimism about near-term U.S. open-source model labs relative to firms like Mistral.
Marc Andreessen predicts that hardware and software optimization will rapidly push frontier-scale models onto consumer PCs within months of being considered impractical.

Security And Identity: Agents Amplify Both Offense And Defense; Proof-Of-Human As A Control Response

Marc Andreessen claims bots and cheap explosive drones create economic asymmetries where attacks are cheap but defense and verification are expensive, requiring new defensive technologies and approaches.
Marc Andreessen states that a16z is a key participant in the World proof-of-human project and that he believes its approach is correct for addressing bots.
An unnamed speaker argues permissive 'YOLO' usage patterns are how early adopters discover both valuable capabilities and dangerous failure modes of agents.
Marc Andreessen claims some users are having AI agents scan local networks, identify insecure IoT devices, and take over control of home systems including cameras and access controls.
Alessio Fanelli suggests that using memory-safe-by-default languages like Rust could reduce the need to rely on models to avoid writing memory-unsafe code.
Marc Andreessen predicts a near-term computer security 'apocalypse' as agents expose latent vulnerabilities, followed by widespread automated remediation using coding agents.

Automation End-State Claims: Abundant Software, Declining Ui Salience, And Decompilation Capabilities

An unnamed speaker suggests model-provider lock-in via proprietary internal representations may be limited because a competing model could learn or reverse-engineer what another model produced.
Marc Andreessen argues that the inefficiency of LLM computation versus specialized tools is acceptable because the payoff is broad general capability.
Marc Andreessen claims models are now able to reverse-engineer complex software binaries, enabling recovery of source-like representations where human reverse engineering would be prohibitively slow.
Marc Andreessen predicts software creation will shift from scarce human labor to effectively abundant automated generation, making language choice largely a preference that bots can translate or rewrite on demand.
Marc Andreessen predicts that if software is increasingly used by other bots rather than humans, conventional user interfaces and even browsers could become less necessary.
Marc Andreessen predicts that within roughly a decade, traditional programming languages may stop being a salient interface for building software as humans specify intent and ask AIs to explain implementations.

Watchlist

Marc Andreessen suggests there may be additional not-yet-understood scaling laws ahead for areas such as world models, robotics, and real-world data acquisition.
Shawn Wang suggests agent-to-agent interaction across social networks could introduce alignment and control risks if agents act autonomously.

Unknowns

What objective evidence supports or refutes the claim that current production models are materially throttled ('sandbagged') by compute scarcity rather than limited by model capability?
Is the predicted 3–4 year horizon for chronic AI supply shortages accurate, and what specific supply-chain components are binding (GPUs vs power vs networking vs memory)?
Do agent workloads in practice shift the bottleneck mix toward CPU, memory, and networking as claimed, and under what workload shapes (tool calls, browsing, retrieval, multi-agent orchestration)?
Are there measurable gains in revenue-per-GPU-hour for older inference chips due to software improvements, and how broadly does this apply across chip generations and workloads?
How often do app-layer AI products actually get displaced when foundation models absorb their differentiating features, and what features are most vulnerable to absorption?

Investor overlay

Read-throughs

If production LLM capability is constrained by inference capacity, pricing power and utilization for deployed inference infrastructure could stay unusually high, with software optimization potentially lifting revenue per GPU hour, including on older chips.
If agent workloads expand demand and shift bottlenecks beyond GPUs, spending could broaden toward CPUs, memory, networking, storage, and orchestration software that supports tool calls, retrieval, and multi agent coordination.
If bot driven offense and identity fraud accelerate, security and verification layers such as proof of human systems could become a gating requirement for agentic products, shifting budgets toward defensive tooling and identity infrastructure.

What would confirm

Objective indications that model outputs are being intentionally throttled by capacity limits, plus sustained high utilization and rising effective pricing for inference across multiple providers and workloads.
Clear workload evidence that agents increase non GPU constraints such as higher CPU utilization, memory footprint, network egress, and storage I O, with customers reporting these as primary scaling blockers.
Rising adoption of human verification and agent containment controls as default product requirements, with security incidents attributed to autonomous agents driving measurable increases in spend on defensive and identity tools.

What would kill

Evidence that delivered model capability is not materially limited by inference capacity, for example performance improvements arrive without additional capacity, and utilization and pricing normalize quickly.
Agent deployments that remain GPU bound with minimal incremental demand for CPU, memory, networking, and storage, or that fail to scale beyond pilots due to cost or reliability issues.
Security outcomes improve without new verification layers, or proof of human approaches fail to gain adoption, reducing the urgency and budget shift toward identity and defensive agent controls.

Sources

Marc Andreessen on AI Winters and Agent Breakthroughs

2026-04-03 a16z.simplecast.com