Computing Infrastructure and the Continuous Operation of Intelligence · Jensen Huang

2026-06-09 · A faithful, transcript-grounded reading by PodLens

Original episode:https://youtu.be/tsQB0n0YV3k?si=pC6HMVlFXJZKNqtO　·　Timestamps are clickable — they seek the player in place

Computing InfrastructureCodesignNVIDIAInference InfrastructureContinuous Generative Computing

What This Episode Is About

NVIDIA founder and CEO Jensen Huang engaged in a deep conversation with host Anj in the Stanford CS153 classroom. Jensen Huang analyzed the most fundamental reshaping of computer science in 60 years: computing is evolving from an on-demand model based on pre-recorded retrieval to a real-time, generative, and continuously running Agentic system. He detailed the underlying logic of extreme codesign (Codesign) across the chip, compiler, software, and network layers, and explained how a 1-million-fold leap in computing power over 10 years supports the data explosion of generative AI. The conversation also discussed the commercial and security foundations of open-source versus closed-source models, the targeted optimization of the Vera Rubin hardware architecture for Agent-level low-latency tool calls, and how to view the deep coordination failure between computing bottlenecks and fragmented university research computing. Finally, Jensen Huang shared the strategic path of resetting towards robotics (Thor) after abandoning mobile (Tegra) during the company's development, as well as his personal view on resilience—that "90% of it is suffering"—providing clear system-level guidance for engineers and decision-makers in the era of ubiquitous intelligence.

Timeline Topic Map

[00:08-01:04] Host Anj introduces and welcomes NVIDIA co-founder and CEO Jensen Huang back to the Stanford classroom.
[01:05-03:13] Exploring the fundamental reshaping of computing paradigms. Jensen Huang points out that the current computing paradigm is undergoing its biggest transformation in 60 years: shifting from "pre-recorded retrieval" to "real-time generation," where AI not only responds to instructions but also understands and generates contextually consistent intent.
[03:14-05:33] Discussing software development methodologies and the reorganization of corporate structures. AI brings a qualitative change to software execution mechanisms (neural networks versus binary compilation), and robotics applications like self-driving are truly unlocked under deep learning.
[05:34-07:12] Analyzing the logic of "reasoning" and "thought generation" post-GPT. AI performs slow thinking by generating internally consumed tokens and achieves tool calling by outputting external tokens.
[07:13-08:20] Computing is migrating from "on-demand" cloud services to "continuously running" Agentic systems, bringing brand-new opportunities to hardware and software infrastructure.
[08:21-10:01] Tracing the historical legacy of Codesign (collaborative design). Using John Hennessy's RISC architecture research at Stanford as an example to show how co-optimizing compilers and instruction set hardware can create system performance superior to independent optimizations.
[10:02-11:20] Explaining why extremely compute-intensive tasks like deep learning require ultimate Codesign. NVIDIA is the first systems company to perform global codesign across CPUs, GPUs, networks, switches, storage, and software frameworks.
[11:21-13:49] Comparing the limitations of Moore's Law with NVIDIA's leap in computing power. As Dennard Scaling fails, general-purpose CPU computing power has only increased 10-fold in 10 years, whereas NVIDIA achieved a 1-million-fold increase in computing power over 10 years through global Codesign, directly catalyzing unsupervised pre-training on internet-scale massive data.
[13:50-17:10] Exploring the evolution of education and textbooks in the AI era. Jensen Huang points out that traditional pre-printed textbooks can no longer keep up with the speed of AI's real-time knowledge generation, and teachers and students should deeply integrate AI into research and learning; he emphasizes that although tools change, the foundation of first principles (such as the Mead & Conway VLSI design methodology) remains solid.
[17:11-19:32] The selection logic of open-source versus closed-source Frontier models. Jensen Huang reveals that NVIDIA's engineers have 100% deployed Agents to assist in development, and advises developers to actively use top-tier closed-source APIs like OpenAI and Anthropic, while explaining NVIDIA's motivation for exploring open-source foundation models.
[19:33-21:45] Analyzing the representation learning needs of domain-specific models. NVIDIA is committed to building foundation models in fields such as biology (BioNemo), autonomous driving (Alpamayo), and robotics (Groot) to activate downstream industrial ecosystems.
[21:46-23:52] Explaining the multilingual fairness of open-sourcing language models and their integration with physical world models. Open-sourcing Nemotron prevents minor languages (such as Swedish) from being marginalized by commercial models, and integrating language models with physical world priors can reduce the training data overhead of autonomous driving (Alpamayo) by several orders of magnitude.
[23:53-25:45] Arguing the irreplaceable nature of open-source, transparent models in security and defense. Systems cannot defend against black boxes; transparency is the foundation for collective interrogation and defense. He proposes deploying massive lightweight AIs (such as Nemotron Nano) to form a swarm-like defense network to resist complex cybersecurity attacks.
[25:46-28:57] Discussing the controversy over compute utilization and the low MFU of the xAI Memphis compute pool. Jensen Huang opposes the MFU (Model Flops Utilization)-only view, pointing out that to avoid Amdahl's Law and address system-level bottlenecks in network, storage, and memory bandwidth, system design should overprovision Flops.
[28:58-32:04] Analyzing the architectural shift from Hopper to Blackwell. Hopper targets pre-training, while Grace Blackwell NVLink 72 targets memory bandwidth bottlenecks during the inference and Decode phases, achieving a 50-fold inference performance improvement in 2 years through a 72-card global interconnect. In decode-dominated scenarios, although MFU is extremely low, the number of tokens generated per unit of power (tokens per watt) is extremely high.
[32:05-33:03] Explaining that the art of strategy lies in the compromise between multi-domain and single-domain focus. Over-specialization (overfit) loses market scale, while over-generalization (general purpose) loses competitive advantage; strategic artists must find a balance between them.
[33:04-38:11] Deconstructing the chip iteration roadmap. Hopper corresponds to pre-training, Grace Blackwell NVLink 72 corresponds to large model inference, Vera Rubin targets the Agentic paradigm (high-bandwidth memory direct connection and low-latency single-threaded CPU to prevent GPUs from waiting for tool calls), and Feynman will target Agent/Sub-agent swarm systems.
[38:12-41:59] Exploring paths to resolve energy as a long-term bottleneck for computing power. With the popularization of continuous generative computing, computing power and energy demands will explode a thousand-fold, but this strongly drives commercial market-based investments in sustainable energy such as nuclear and solar, thereby revitalizing outdated power grid upgrades.
[42:00-45:32] Sharing life and career advice. Jensen Huang questions the dogma of "choose what you love," believing that 90% of a CEO's job is suffering and solving thorny problems, emphasizing that the muscle of resilience against setbacks can only be forged through constant pain and struggle.
[45:33-47:16] Fun recollections of his time working at Corvallis Denny's, sharing his favorite American meal combinations and how Denny's enlightened his social awareness as a young Chinese-American.
[47:17-50:55] Responding to chip controls under geopolitics. He strongly opposes analogizing general-purpose chips to atomic bombs, pointing out that depriving other countries of general-purpose computing power not only distorts general-purpose computing applications like healthcare and gaming, but is also highly likely to bring structural destruction to the US semiconductor industry by stifling market demand (similar to the telecom industry back then).
[50:56-52:57] Refuting the futurist fantasy of AGI singularity collapse. He points out that the black-box myth of neuroscience and science fiction imaginations of singularity instantly destroying humanity are irresponsible, and that computer science students should be provided with a rational, technologically optimistic outlook.
[52:58-55:54] Responding to Anj's criticism regarding the scarcity of computing power in US domestic academia. He clarifies that chip supply is abundant; the core issue is that the fragmentation of university research funding (coordination failure) leaves them unable to build large shared computing clusters. He suggests Stanford use $1 billion of its $40 billion endowment to directly purchase a campus-wide computing network from cloud service providers.
[55:55-58:39] Summarizing the best and worst parts of a CEO's job. The best part is building and validating strategies with top scholars in a complex and uncertain future; the worst part is bearing immense organizational responsibility, recalling four or five fatal decision errors early in the company's history that brought it to the brink of bankruptcy (such as early curved surface designs and forward texture mapping that deviated from the industry's triangle specification standards).
[58:40-01:04:30] Reviewing the gains and losses of the Tegra mobile chip transition. Although it reached a scale of $1 billion, it was locked out by Qualcomm in the 3G/4G modem era; however, this setback led NVIDIA to reset its low-power technology toward robotics (the ancestor of the Thor chip), turning a bad thing into the compounding interest of optionality.
[01:04:31-01:08:18] Summarizing the systematic strategic logic under the fog of war. Building a future mental model through observation, first-principles deconstruction, and "So-what" questioning, back-mapping the current path, controlling opportunity costs during execution, and maintaining optionality.

Core Viewpoints List

Computer science is undergoing a fundamental reshaping from "pre-recorded retrieval" to "real-time generation." The traditional computing paradigm essentially pulls and presents pre-recorded images, videos, or program binaries based on instructions; whereas in the Agentic era, computers perform real-time generation and reasoning based on a contextual understanding of intent. [01:17-03:13] | Type: Viewpoint
Against the backdrop of the failure of Dennard Scaling, chip design must shift toward extreme collaborative design (Codesign) of hardware, compilers, and software stacks. The era of general-purpose CPUs relying on semiconductor scaling is over. Through global coordination of CPUs, GPUs, high-speed interconnects, switches, and libraries, NVIDIA achieved a 1-million-fold leap in computing performance within 10 years, whereas traditional hardware-only upgrades would have only yielded a 10-fold improvement. [10:02-12:20] | Type: Fact
Model Flops Utilization (MFU) is a limiting metric that easily causes design bias, and system design requires compute overprovisioning. To avoid Amdahl's Law when dynamic bottlenecks occur in network latency, storage throughput, and memory bandwidth, systems must have sufficient redundant computing power, treating Flops as a cheap resource and ensuring instantaneous high-concurrency throughput for overall tasks at the expense of local utilization. [27:11-28:57] | Type: Viewpoint
The Decode/Inference phase of large language models belongs to a memory bandwidth-constrained scenario, requiring high-density interconnect networks (such as NVLink 72) to achieve ultra-high energy efficiency. The reason the Blackwell architecture achieves a 50-fold tokens-per-watt improvement despite extremely low MFU in decode scenarios is that it aggregates the memory of 72 chips through a high-speed backplane bus, eliminating the fatal latency of reading and writing memory across network nodes. [29:33-31:30] | Type: Fact
In the choices of critical system design, the art of strategy lies in finding a compromise between "narrow markets caused by high specialization" and "mediocrity brought by generalization." Although over-fitting (overfit) to a single task can achieve ultimate performance, it cannot support high R&D costs; over-generalization (general purpose) faces low efficiency in all areas. Architects must rely on intuition about the future of the industry to make strategic allocations. [32:05-33:03] | Type: Viewpoint
Agent-level computing paradigms have spawned a processor hardware architecture (Vera Rubin) that is completely different from the cloud services era. When an Agent executes tool calls, the GPU is in a waiting state; its core bottleneck lies not in multi-core throughput, but in the extremely low latency of the CPU running single-threaded complex logic. Therefore, the Rubin architecture chooses to strengthen single-core low-latency performance on the CPU and mounts storage directly onto the ultra-high-speed bus fabric. [36:04-37:52] | Type: Fact
The root cause of the "computing power famine" in academia and university research lies in the coordination failure of research funding, rather than the chip supply itself. Universities follow a fragmented model where individual labs independently compete for small grants, leaving them unable to afford the construction or leasing costs of centralized million-card clusters. The solution lies in budget restructuring, with the university level centrally allocating special funds on the order of $1 billion to build a campus-wide supercomputing cloud service shared by the entire school. [53:27-55:19] | Type: Viewpoint
Resilience against setbacks cannot be learned in a greenhouse; it must be forged at the muscle level by enduring failure and facing desperate situations. 90% of a real career is about pain, challenges, and groping in the dark. The key to success is not pursuing endless happiness, but learning to maintain form during low points and allowing strategic mistakes to crystallize into long-term optionality for the enterprise. [42:00-45:04] | Type: Viewpoint
Attempting to deprive other countries of general-purpose computing power not only confuses GPUs with atomic bombs in technical logic, but will also cause long-term ecological self-destruction to the US semiconductor industry. GPUs widely serve general-purpose civilian scenarios such as medical scanning and image rendering. If US semiconductor policy forces the abandonment of two-thirds of the global market, it will cause the domestic industry to shrink due to a loss of R&D funding, repeating the decline of the US telecom industry years ago. [47:29-50:34] | Type: Viewpoint

Plain English Retelling

So let's talk about Jensen Huang's share in the Stanford classroom. While most people are marveling at NVIDIA's skyrocketing market value, this conversation actually exposes his philosophical judgment on the underlying mechanics of the entire computing world, which is extremely hardcore.

First, we have to understand that the discipline of computer science is undergoing a complete reshuffle for the first time in 60 years. In the classical era established by IBM system 360, our use of computers was "retrieval-based": software, pictures, and videos were pre-written and recorded on the hard drive by programmers, and when you clicked, it retrieved them for you to see. But current AI computing is "real-time generated." More interestingly, we are saying goodbye to "on-demand computing" (On-demand). Previously, when we used computers, we only opened a webpage or sent a command when we needed it; but in the Agentic era, AI agents are constantly hanging in the background, "continuously running." This is like shifting from carrying water from a well every day to having water pipes installed at home, where the water flow is continuous.

This brings about a huge hardware and software strategic divergence. Many people are currently hyping Model Flops Utilization (MFU), which looks at whether the computing power of the graphics card you bought is fully utilized, and if the utilization rate is low, they think it's a waste. But Jensen Huang poured cold water on this. He believes that excellent system design should "pursue low MFU and overprovision computing power." Why? Because in a massive supercomputing cluster, computing power (Flops) is actually the cheapest resource; the real bottlenecks lie in network transmission, storage reading, and memory bandwidth. If you insist on squeezing computing power to 100%, system once it encounters sudden data congestion, it will get stuck on other bottlenecks (this is Amdahl's Law). This is like a highway: you can't cram all the cars onto it just to "maximize highway utilization," as that will only cause a massive traffic jam.

This "anti-utilization-only theory" directly guided the development of Blackwell and Rubin chips. For example, Blackwell NVLink 72 was designed to solve the Decode memory bandwidth issue in AI inference. Even if its MFU looks very low, the tokens it spits out per unit of power have exploded 50-fold. And when it comes to the Rubin architecture, they even specifically designed a single-core, ultra-fast CPU. Because when an Agent is executing tools (like querying a database or calling an API), the GPU is idle and must wait for the CPU to finish computing. If the CPU is slow, the expensive GPU cluster will spin its wheels in vain. This is all deduced from the first principles of global system codesign (Codesign).

Finally, he also debunked the truth about American universities "not being able to afford cards." He said that chips are actually in plenty of supply; Stanford not being able to afford them is not because NVIDIA is withholding sales, but because the university's incentive mechanism is broken. Professors all occupy their own hills and apply for small grants individually, and no one can save enough money to buy a large cluster—this is called "coordination failure." If Stanford really wants its students and professors to be at the forefront of AI, it should carve out $1 billion from its $40 billion endowment to directly lease a supercomputing cloud for the entire school to share. These words can be said to be very direct, but they also hit the nail on the head regarding the underlying tension between technological change and outdated organizational structures.

Segments Worth Listening Closely To

[11:21-13:00] Jensen Huang compares the end of Moore's Law with the explosion of Codesign, deconstructing the physical limits of Dennard Scaling, and explaining why global collaborative design can achieve a 1-million-fold computing power differential over 10 years. This is a watershed for understanding modern semiconductor physics and the evolution of computing systems.
[27:11-28:57] Deeply deconstructing why the MFU (Model Flops Utilization)-only metric is one-sided, and discussing the system architecture intuition of "overprovisioning computing power to avoid bottlenecks." Highly technical, this is a hardcore chapter that system engineers cannot afford to miss.
[33:15-37:52] Completely outlining the evolutionary roadmap of processor physical architectures from Hopper, Blackwell, Vera Rubin, to Feynman, especially the hardware-level dissection of CPU/GPU coordination and fabric storage mounting under the Agent computing paradigm.
[42:00-45:04] Discussing why "choose what you love" is a one-sided dogma, sharing the reality of "suffering 90% of the time" in a CEO's job, and how to hone resilience against setbacks like training a muscle. Highly sincere and powerful life advice.
[53:27-55:19] Conducting a first-principles mechanism analysis of the "computing power famine" in academia such as Stanford, sharply pointing out the organizational structural flaw of coordination failure, and providing a direct solution of utilizing a $1 billion endowment.

Resonances with past episodes

Corroborates→ System Design of Venture Capital and Paradigm Shifts in the Age of Intelligence · Ben Horowitz
Both parties point out that the US policy restricting global sales of GPUs is a severe regulatory regression. From the perspective of industrial ecosystem blood-loss and self-destruction, Jensen Huang corroborates Ben Horowitz's assertion that the tech industry's lack of voice in policy-making will lead to geopolitical crises.
This[47:29-50:34] Attempting to deprive other countries of general-purpose computing power not only confuses GPUs with atomic bombs in technical logic, but will also cause long-term ecological self-destruction to the US semiconductor industry, as losing a large proportion of the global market will cause domestic R&D funding to shrink due to blood-loss.
Related[51:58-52:45] The tech industry's lack of voice in Washington's policy-making will bring unbearable geopolitical crises and regulatory regressions to the industry's development, such as government restrictions on the global sale of GPUs.
Isomorphic→ Frontier Systems Compute and the Context Loop War · Anjney Midha
The 'extreme codesign' explained by Jensen Huang (deeply binding chips, networks, and software stacks to squeeze out performance) explains at the underlying mechanism level why computing power across different generations and manufacturers is highly incompatible at the micro level and cannot become a commoditized good.
This[10:02-12:20] Chip design must shift toward extreme collaborative design (Codesign) of hardware, compilers, and software stacks, improving performance through global coordination of processors, high-speed interconnects, switches, and libraries, rather than relying on single hardware scaling.
Related[48:28-51:01] GPU computing power is not a commoditized common good; not only are chips from different manufacturers irreplaceable, but even different generations of chips from the same manufacturer (such as H100 and B300) are incompatible at the micro level.
validation← AI Overexpansion's Hard Grid Barriers and Energy Arbitrage · Chase Lock Miller
Jensen Huang articulates GPU codesign technology strategy in CS153; Chase validates the industry-wide propagation of that strategy from the data center infrastructure buyer's angle — a supply-demand dialogue between the chip layer and the infrastructure layer.
This[10:02-12:20] Through extreme codesign of CPU, GPU, high-speed interconnects, and libraries, Nvidia achieved a 1,000,000x compute performance jump over 10 years — and every order of magnitude of that compute needs an equivalent growth in power and data centers to absorb it.
Related[08:56-10:51] The AI supply chain's dynamic bottleneck has shifted from chips to 'energized data centers' — owning expensive chips without grid access means owning useless silicon; physical grid has become the new competitive moat.
validation← Compute, Trading, and Hiring: Jane Street's Technology and Organizational Philosophy · Ron Minsky & Dan Ponttovo
Jensen Huang articulates the technical substrate of GPU compute scaling; Jane Street's large-scale buying experience validates the real-world impact of Nvidia's codesign strategy from the demand side.
This[10:02-12:20] With Dennard Scaling exhausted, Nvidia achieved a 1,000,000x compute performance jump over 10 years through global codesign of CPU, GPU, high-speed interconnects, switches, and libraries.
Related[17:01-18:47] Megawatt rack water cooling and AC/800V DC power transmission challenges — physical engineering is an underestimated system design bottleneck in the compute scaling race.
Complementary← Computational Design and Synthetic Biology · Neri Oxman
Both explore how to cope with downturns and resets in one's career. Jensen Huang emphasizes forging resilience against setbacks by enduring failure and groping in the dark; whereas Oxman points out that only by binding one's life to a bottom-up "calling" rather than an external "career track" can one maintain creativity and sensitivity through constant restarts.
This[42:00-45:04] Resilience against setbacks cannot be learned in a greenhouse; it must be forged at a muscular level by enduring failure and facing desperate situations. A real career is 90% about pain, challenges, and groping in the dark. The key to success is not pursuing endless happiness, but learning to maintain form without distortion during low points...
Related[48:27-48:59] A career is a top-down social track imposed on an individual by external social structures, whereas a calling is a bottom-up direction of intrinsic value and flow. Only by binding one's life to a calling can one maintain creative sensitivity through constant restarts.
Corroborates← The Reality of Security Crises and Organizational Resilience · Joe Sullivan
Both argue that "resilience" cannot be acquired by evading mistakes or in a greenhouse, but must be forged by facing crises head-on and reflecting and rebuilding after experiencing real failure, desperation, or disaster.
This[42:00-45:04] Resilience against setbacks cannot be learned in a greenhouse; it must be forged at a muscular level by enduring failure and facing desperate situations.
Related[40:01-40:20] True personal reputation and professional resilience do not come from deliberately avoiding disasters, but from public reflection and community rebuilding after a disaster.

Tensions with past episodes

ContradictionDirect conflict← The Discipline of Value Delivery per Gigawatt · Amin Vahdat
The two have a fundamental disagreement regarding Model FLOPs Utilization (MFU) in system design. The former views low MFU as a warning sign of system imbalance and wasted compute; whereas the latter advocates for actively overprovisioning compute and accepting low MFU, treating redundant compute as a cheap resource to bypass overall system bottlenecks caused by network and memory bandwidth.
This[27:11-28:57] Model Flops Utilization (MFU) is a limiting metric that easily biases design; system design requires compute overprovisioning. To bypass Amdahl's Law when dynamic bottlenecks occur in network latency, storage throughput, and memory bandwidth, the system must possess sufficient redundant compute, treating Flops as a cheap resource and sacrificing local utilization to ensure instantaneous high-concurrency throughput for the overall task.
Related[14:06-15:15] System Balance is key to unleashing compute power. If FLOPs increase but HBM throughput, SRAM cache, and network bandwidth do not scale proportionally, compute power will be wasted waiting for data, resulting in extremely low MFU.

A faithful reconstruction and plain-language retelling of the episode, generated by PodLens.

This is one source-grounded reading, not a replacement for the original. Every point is anchored to its source, so you can check it yourself — and corrections are welcome.