System-Level Scaling And Co-Design At Rack/Factory Boundary

Issue 82 Edition 2026-03-23 10 min read

General

Sources: 1 • Confidence: Medium • Updated: 2026-03-25 17:51

Key takeaways

In the corpus, Jensen Huang says his direct staff is about 60 people and he avoids one-on-one meetings in favor of group problem-solving to drive cross-domain co-design.
In the corpus, Jensen Huang says NVIDIA's primary moat is the installed base of CUDA reinforced by sustained investment and millions of developers porting large software stacks onto it.
In the corpus, Jensen Huang says NVIDIA coordinates supply-chain scaling by briefing many industry CEOs on near-term growth drivers and future direction to shape their investment plans.
In the corpus, Jensen Huang disputes the claim that AI will eliminate software and tools, arguing that effective agents will use existing tools and file systems to access ground truth and do external research.
In the corpus, Jensen Huang frames computers as shifting economically from warehouses to factories that generate revenue-producing token commodities with tiered pricing (free, mid-tier, premium).

Sections

System-Level Scaling And Co-Design At Rack/Factory Boundary

In the corpus, Jensen Huang says his direct staff is about 60 people and he avoids one-on-one meetings in favor of group problem-solving to drive cross-domain co-design.
In the corpus, Jensen Huang says extreme end-to-end co-design is necessary for AI systems because distributed workloads hit non-GPU bottlenecks consistent with Amdahl's-law limits.
In the corpus, Jensen Huang says NVIDIA anticipates future AI needs through internal research/model-building, broad collaboration with AI companies, and a flexible CUDA-based architecture that can adapt to algorithm shifts.
In the corpus, Jensen Huang says the fundamental unit of compute has shifted from GPU to computer to cluster and is becoming the AI factory as the relevant system boundary.
In the corpus, Jensen Huang says Mixture-of-Experts inference drove a design shift toward NVLink 72 to keep multi-trillion-parameter models within one unified compute domain.
In the corpus, Jensen Huang says the Vera Rubin rack is materially different from prior Grace Blackwell racks, adding storage accelerators and a new CPU (Vera) to support agent workloads that use tools and data stores.

Platform Moat Via Install Base, Distribution, And Developer Expectations

In the corpus, Jensen Huang says NVIDIA's primary moat is the installed base of CUDA reinforced by sustained investment and millions of developers porting large software stacks onto it.
In the corpus, Jensen Huang says adding CUDA increased GeForce costs by roughly 50% and was associated with a large short-term decline in gross profit and market capitalization (he cites a drop from about $6–8B to about $1.5B).
In the corpus, Jensen Huang claims install base is the most important factor in establishing a computing architecture, outweighing elegance or technical criticism.
In the corpus, Jensen Huang says NVIDIA shipped CUDA on GeForce to seed a large install base even if customers did not use or pay for CUDA.
In the corpus, Jensen Huang attributes early CUDA discovery and adoption to PC-era accessibility where researchers and students could access GeForce GPUs and build commodity clusters, later contributing to deep learning progress.
In the corpus, Jensen Huang argues the CUDA moat compounds because developers expect step-function improvements within months and get distribution across hundreds of millions of systems and major clouds by targeting CUDA first.

Power, Availability Slas, And Supply-Chain Integration As Scaling Bottlenecks

In the corpus, Jensen Huang says NVIDIA coordinates supply-chain scaling by briefing many industry CEOs on near-term growth drivers and future direction to shape their investment plans.
In the corpus, Jensen Huang says moving to NVLink-72 rack-scale systems shifted supercomputer integration from the datacenter to the manufacturing supply chain, requiring partners to build and test fully integrated racks before shipment and increasing supply-chain power needs.
In the corpus, Jensen Huang claims the power grid is sized for rare peak conditions and has substantial unused capacity most of the time that could be used by flexible data centers.
In the corpus, Jensen Huang proposes that if utilities could curtail datacenter power during rare peak events, datacenters could respond by shifting workloads, running slower, or degrading latency while preserving data integrity.
In the corpus, Jensen Huang says power is a key scaling concern for widespread agents and that NVIDIA intends to use extreme hardware-software co-design to improve tokens-per-second-per-watt by orders of magnitude each year.
In the corpus, Jensen Huang says a barrier to using flexible grid capacity is that customers contractually demand near-perfect datacenter availability, cascading high-availability requirements down to cloud providers and utilities.

Agentic Workloads: Tool Use, Io, And Enterprise Security Gating

In the corpus, Jensen Huang disputes the claim that AI will eliminate software and tools, arguing that effective agents will use existing tools and file systems to access ground truth and do external research.
In the corpus, Jensen Huang presents an agentic security model that restricts systems to any two of three permissions: access to sensitive information, code execution, and external communication, alongside enterprise access controls and policy-engine integration.
In the corpus, Jensen Huang says the Vera Rubin rack is materially different from prior Grace Blackwell racks, adding storage accelerators and a new CPU (Vera) to support agent workloads that use tools and data stores.
In the corpus, Jensen Huang says useful agentic systems must access ground-truth files, do external research, and use existing tools rather than rely only on internal knowledge.
In the corpus, Jensen Huang predicts an agentic scaling pattern where capability and output scale by spawning teams of sub-agents, with experiences feeding back into pre-training and post-training.

Token Economics And Ai Factory Monetization Framing

In the corpus, Jensen Huang frames computers as shifting economically from warehouses to factories that generate revenue-producing token commodities with tiered pricing (free, mid-tier, premium).
In the corpus, Jensen Huang claims AI computing scale increased about a million-fold over the last decade and that token costs are falling by about an order of magnitude per year even as system prices rise.
In the corpus, Jensen Huang forecasts that token prices around one thousand dollars per million tokens are approaching and that the share of GDP spent on computation will increase dramatically.

Watchlist

In the corpus, Jensen Huang says NVIDIA coordinates supply-chain scaling by briefing many industry CEOs on near-term growth drivers and future direction to shape their investment plans.
Huang says he plans very soon to send an NVIDIA humanoid robot on a spaceship and later transmit an AI built from his digitized communications to 'catch up' with it at light speed.

Unknowns

What fraction of frontier training corpora is synthetic data today, and how quickly is that fraction changing?
How compute-intensive are real-world production inference workloads as test-time reasoning and agentic behaviors increase (tokens per task, tool calls per task, wall-clock latency constraints, and FLOPs/task)?
Do leading deployments of MoE or multi-trillion-parameter models measurably benefit from single-domain scale-up fabrics like NVLink-72 versus multi-domain sharding, and what are the tradeoffs (latency, utilization, failure domains, cost)?
What is the empirical bill-of-materials and bottleneck profile for agentic workloads (CPU, storage, network, memory bandwidth) compared to LLM-only workloads?
Will enterprise agent frameworks and deployments adopt a permissioning model similar to 'two of three permissions' (sensitive data access, code execution, external communication), and will it be sufficient to enable production rollouts?

Investor overlay

Read-throughs

Rack and factory level co-design may shift value capture toward full-stack system vendors and integrated supply chains, not just single accelerators, as optimization targets expand to interconnect, CPU, storage, and rack-level fabrics.
Installed base and developer ecosystem lock-in may remain a durable moat, implying that distribution and broad deployment venues can outweigh short-term monetization, reinforcing platform dominance through compounding software porting costs.
If token production becomes a monetized factory output with tiered pricing, revenue models may broaden toward service-like pricing and availability SLAs, making power, uptime guarantees, and graceful degradation features economically central.

What would confirm

Public product and architecture roadmaps emphasize rack-level performance metrics, integrated fabrics, and co-designed CPU, interconnect, and storage, with customers benchmarking at rack and factory scale rather than per GPU.
Observable growth in developer adoption and software stack porting to the platform, plus enterprise tooling that assumes the platform by default, indicating ecosystem expectations are reinforcing the installed base moat.
Customer contracts and service offerings incorporate flexible availability terms such as rare curtailment tolerance, workload shifting, or latency tradeoffs, alongside tiered token or capacity pricing in commercial deployments.

What would kill

Real-world workloads fail to benefit from large single-domain scale-up fabrics at acceptable cost and reliability, pushing buyers toward multi-domain sharding and reducing the need for tightly integrated rack-level designs.
Enterprise agent deployments do not require extensive tool, file, and external research access, or governance models like controlled permissions prove insufficient, limiting IO-centric infrastructure demand and associated rack redesign.
Power and grid constraints do not allow meaningful deployment growth, and customers reject flexible availability structures, keeping utilization low and undermining the factory framing and any tiered token monetization assumptions.

Sources

#494 – Jensen Huang: NVIDIA – The $4 Trillion Company & the AI Revolution

2026-03-23 lexfridman.com