Compute Procurement Contracts And Pricing

Issue 72 Edition 2026-03-13 10 min read

General

Sources: 1 • Confidence: Medium • Updated: 2026-03-14 12:30

Key takeaways

Incremental compute capacity can be sourced by outbidding other customers as shorter-term GPU contracts roll off, even if much frontier-lab capacity is locked in via multi-year deals.
Fab construction and tooling capacity scale on roughly 2–3 year timelines, while data centers can be built in under a year, with Amazon reportedly completing some in about eight months.
The claim that older-node chips can cheaply substitute for leading-edge compute is challenged as misleading because system-level communication and memory/network architecture constrain real model performance.
‘Critical IT’ data-center load understates required generation and grid capacity because transmission, conversion, and cooling losses plus reserve margins and turbine derating require meaningfully higher nameplate capacity than IT draw.
A large portion of hyperscaler AI CAPEX is pre-spend for future years (power agreements, turbine deposits, early construction) rather than compute coming online in the current year.

Sections

Compute Procurement Contracts And Pricing

Incremental compute capacity can be sourced by outbidding other customers as shorter-term GPU contracts roll off, even if much frontier-lab capacity is locked in via multi-year deals.
The Alchian–Allen effect implies that adding a fixed cost common to two options can shift marginal demand toward the higher-quality option by shrinking the effective relative price gap.
If GPU costs rise and token throughput differs across model tiers, the implied per-token price premium for a higher-quality model can shrink.
Labs that were conservative on long-term compute commitments may be forced to procure capacity from lower-quality providers and/or accept worse economics when ramping in a pinch.
Labs can access compute without directly owning it by having hyperscalers serve their models on hosted platforms in exchange for a revenue share or markup.
Locking in five-year compute contracts early can create a durable margin advantage for model vendors if renewal pricing later rises due to higher model value and scarce supply.

Semiconductor Supply Chain As Primary Bottleneck

Fab construction and tooling capacity scale on roughly 2–3 year timelines, while data centers can be built in under a year, with Amazon reportedly completing some in about eight months.
Producing one gigawatt of Rubin-class data-center capacity is claimed to require roughly 55,000 3nm wafers, 6,000 5nm wafers, and 170,000 DRAM wafers, totaling about 2 million EUV wafer passes.
Given an EUV tool throughput of about 75 wafers per hour and ~90% uptime, about 3.5 EUV tools are needed to supply the ~2 million EUV passes required for one gigawatt of leading-edge AI chips.
ASML’s ability to scale EUV output is constrained by long-lag expansion across complex subsystems (source, stages, optics) and a specialized supplier network, alongside industry reluctance to overbuild due to boom-bust history.
Memory makers did not build significant new fab capacity over the last 3–4 years due to low prices and losses, so meaningful new memory capacity likely cannot come online until late 2027 or 2028.
TSMC is described as not materially raising prices while memory vendors are raising prices sharply and signing long-term deals, shifting more margin capture toward memory suppliers.

System Level Scaling Limits Interconnect And Memory Bandwidth

The claim that older-node chips can cheaply substitute for leading-edge compute is challenged as misleading because system-level communication and memory/network architecture constrain real model performance.
AI system performance drops sharply as communication moves from on-chip to in-rack to off-rack because bandwidth and latency worsen by roughly an order of magnitude at each boundary.
The idea that slow parameter scaling is mainly explained by insufficient NVIDIA scale-up memory capacity is challenged as secondary to development-speed tradeoffs driven by RL and iteration time.
For some inference workloads around 100 tokens/second, Blackwell is claimed to be about 20× faster than Hopper despite FLOPs suggesting only ~2–3×.
NVIDIA expanded its scale-up domain from 8 GPUs per server (H100 era) to 72 GPUs per rack at terabytes-per-second speeds with Blackwell/VL72.
An HBM4 stack is estimated to provide about 2.5 TB/s bandwidth, while DDR5 over a similar chip-edge area would provide roughly 64–128 GB/s.

Power Buildout Constraints And Mitigations

‘Critical IT’ data-center load understates required generation and grid capacity because transmission, conversion, and cooling losses plus reserve margins and turbine derating require meaningfully higher nameplate capacity than IT draw.
Skilled construction labor (especially electricians and plumbers) is a major constraint on data-center power buildouts and will likely require training, immigration of skilled workers, and industrial modularization to scale.
In a chip-constrained environment, minimizing time-to-deployment becomes a dominant objective, favoring modularized data centers and racks so new chips can start producing tokens immediately.
GPU hardware failure management is a key differentiator among cloud providers because GPUs are unreliable in practice, with around 15% of deployed Blackwell units reportedly requiring RMA.
Power for AI can scale through many channels beyond combined-cycle turbines (aeroderivative turbines, reciprocating engines, ship engines, fuel cells, solar-plus-battery), making the power supply chain structurally simpler than the chip supply chain.
Even substantially higher power CAPEX is economically tolerable because energy is a small fraction of GPU total cost of ownership, so higher power prices only modestly increase per-hour GPU cost.

Near Term Compute Supply Vs Demand

A large portion of hyperscaler AI CAPEX is pre-spend for future years (power agreements, turbine deposits, early construction) rather than compute coming online in the current year.
About 20 gigawatts of incremental data-center capacity is expected to be deployed in the US this year.
Some cloud providers are not meeting previously promised near-term capacity delivery due to data-center delays, even though much Blackwell capacity coming online this quarter is already sold.
Anthropic is expected to reach roughly 5–6 gigawatts of compute by year-end via a mix of its own compute plus partner-served capacity, while OpenAI ends the year somewhat higher.
At about $10B per gigawatt-year of rented compute, around 4 gigawatts of inference capacity would be needed to support a large near-term revenue growth scenario discussed for Anthropic, excluding training expansion.
OpenAI and Anthropic each operate on the order of a few gigawatts of compute today and are attempting to scale materially higher.

Unknowns

What is the actually commissioned (not just announced) incremental U.S. data-center capacity over the next 12 months, and how much of it is AI-accelerator load?
What are the true current and year-end compute footprints (in gigawatts and effective GPU-hours) for major AI labs, including how much is owned, reserved, and partner-served?
How prevalent are multi-year H100/Hopper contracts at the reported high price points, and how do their utilization and service-level terms compare across providers?
How large are the practical, real-world inference throughput gains from Hopper to Blackwell across representative serving setups (batching, context length, parallelism choices)?
What are the binding constraints for AI chip output through 2028: EUV tool count, HBM supply, advanced packaging capacity, or something else?

Investor overlay

Read-throughs

GPU pricing and access may be driven more by contract structure and renewal exposure than by headline hardware supply, allowing incumbent capacity holders to preserve economics while late buyers face higher costs and operational risk.
Primary AI scaling bottlenecks may shift upstream to semiconductors and memory and advanced packaging with multi-year lead times, making short-term data center build speed less decisive than EUV, HBM, and packaging availability.
Power infrastructure constraints may be understated when using critical IT load, and hyperscaler AI CAPEX may not translate into same-year compute because meaningful spend is on long-lead prerequisites like power agreements and early construction.

What would confirm

Disclosures or channel checks showing multi-year accelerator contracts with high prices, strong utilization, and tight service level terms, plus renewals clearing at similar or higher effective rates for older and current GPUs.
Evidence that realized model performance and cost are constrained by interconnect and memory bandwidth, such as large efficiency drops off-rack and platform focus on expanding high-bandwidth scale-up domains.
Commissioning data showing modest near-term incremental data center capacity or low AI-accelerator share versus announced plans, alongside CAPEX mix shifting toward power deposits, transmission, turbines, and early site work.

What would kill

Widespread near-term availability of compute via short-term contracts rolling off without price pressure, enabling new entrants to secure large volumes at improving economics and reliability.
Clear proof that older-node or lower-spec systems deliver comparable real-world training and inference throughput at materially lower cost, without interconnect and memory bandwidth becoming binding in typical deployments.
Measured commissioning and grid buildout keeping pace with AI load, with critical IT load closely matching required nameplate capacity and hyperscaler CAPEX rapidly converting into operational accelerator megawatts.

Sources

Dylan Patel — Deep dive on the 3 big bottlenecks to scaling AI compute

2026-03-13 dwarkesh.com