中文

The Discipline of Value Delivery per Gigawatt · Amin Vahdat

2026-06-09 · A faithful, transcript-grounded reading by PodLens

Original episode:https://youtu.be/VeTqsCpcDgg?si=Hv0WYODnPWCCT-qP · Timestamps are clickable — they seek the player in place

Compute InfrastructureValue per GigawattData CentersEnergy EfficiencySystem Co-design

What This Episode Is About

This episode features a guest lecture from Stanford CS153 (Frontier Systems), delivered by Amin Vahdat, Vice President of Systems Infrastructure at Google. Centered around the theme of "Value per Gigawatt," the lecture explores how, in the era of AI scaling, the focus of compute infrastructure construction must shift from mere "capacity and scale (Megawatt/Gigawatt/FLOPs)" to "actual delivered user value and effective output (Goodput/Daily Active Users)." Drawing on Google's nearly 30 years of infrastructure practice, Amin Vahdat analyzes system balance, Amdahl's Law, the trade-offs between high availability and reliability, and the constraints of power supply and supply chains in hyperscale systems. He also shares Google's insights on TPU interconnection technologies (such as Optical Circuit Switching, OCS), hardware-software co-design, and environmental and community responsibility.

Timeline Topic Map

Key Takeaways List

  1. The true measure of compute capacity is the actual value delivered per dollar (Value per Dollar) or user activity (Daily Active Users), rather than simply gigawatts (Gigawatts) or hardware FLOPs. - Evidence Anchor: [04:55-05:08] - Type: Opinion - Speaker's Reservations: Admits to spending a vast amount of effort on procuring data center power capacity, but still insists that the value metric must come first.
  2. Modern accelerators exhibit extremely high synchrony (Synchronous Computation) in large model training, which causes clusters to regress from loosely coupled fault-tolerant architectures back to tightly coupled supercomputer states where a single point of failure halts the entire system. - Evidence Anchor: [12:29-13:57] - Type: Fact
  3. Internal and external customers of frontier labs are demonstrating a new attitude: to obtain more compute capacity, they are willing to sacrifice a portion of service reliability (accepting 99.9% or even lower availability). - Evidence Anchor: [11:26-12:28] - Type: Opinion
  4. System Balance is key to unleashing compute power. If FLOPs increase but HBM throughput, SRAM cache, and network bandwidth do not scale proportionally, compute power will be wasted waiting for data, resulting in extremely low MFU. - Evidence Anchor: [14:06-15:15] - Type: Opinion
  5. The popularity of sparse computation algorithms like Mixture of Experts (MoE) leaves most current hardware systems that lack matched designs facing severe memory bandwidth shortages. - Evidence Anchor: [16:53-17:34] - Type: Fact
  6. As AI shifts from training-dominated to inference (Serving)-dominated, compute deployment will gradually divert from massive contiguous clusters to small, dispersed, and highly flexibly schedulable small-to-medium sites under 100 megawatts. - Evidence Anchor: [25:05-25:24] - Type: Prediction
  7. Hardware Specialization is the inevitable path to solving the bottleneck of CPU performance scaling. Google's bifurcation into 8i (inference) and 8t (training) TPUs is determined by the differences in memory, network, and compute ratios required by different workloads. - Evidence Anchor: [47:20-48:50] - Type: Opinion
  8. Compute hardware will remain the primary bottleneck for the next 5 to 10 years. Any algorithmic breakthrough in energy efficiency will be rapidly consumed by new, more valuable compute demands due to Jevons paradox. - Evidence Anchor: [58:40-01:00:05] - Type: Prediction
  9. Zero-sum games and "winner-take-all" are limited technical perspectives. A healthy supply chain requires diversity to hedge against concentration risks like geopolitics and earthquakes; component manufacturers do not want a single customer to monopolize their capacity. - Evidence Anchor: [49:37-52:56] - Type: Opinion
  10. Data centers should serve as active assets to communities rather than burdens. This requires data center builders to make trade-off decisions, such as using dry cooling that sacrifices 10% energy efficiency in water-scarce regions, or using Demand Response technology to help public power grids shave peaks and fill valleys.

Plain English Retelling

These days, we are often blown away by massive numbers—like "some company just secured 1 gigawatt of power" or "this data center campus cost tens of billions of dollars." But Amin Vahdat pours cold water on this: if your system design is unbalanced, or if it crashes every day, all those gigawatts are just expensive decorations wasting electricity. It's like buying a supercar capable of hundreds of miles per hour, only to get stuck on a narrow, potholed, muddy road where the engine's horsepower (equivalent to FLOPs) can't translate into speed. What you really need to focus on is how much cargo you actually delivered and how much money you made by driving the car out—that is the "actual value delivered per gigawatt."

In the past, internet services (like Google Search) pursued "five nines" of availability (99.999%), which translates to only 30 seconds of downtime a year. To achieve this, we had to build double backups for all power and networking, meaning half of the power supply and equipment sat idle most of the time. But now, when frontier labs train large models, their attitude has completely flipped: give them twice the compute power, and they will gladly sign off on it even if it completely breaks down for a few days a year. Because training models is a "compute-devouring beast," they care far more about training the model as fast as possible than about never going offline.

But this brings an incredibly difficult hard engineering challenge: past internet services were loosely coupled—if one server broke, others filled in, and users never noticed. Today's AI model training, however, is synchronous (all TPUs/GPUs need to exchange parameters frequently and synchronously). This means that if just one of tens of thousands of accelerators drops offline due to a network or cooling issue, the entire training job has to halt and roll back. This demands extremely precise interconnect technology, such as the Optical Circuit Switching (OCS) technology used by Google. Simply put, OCS is like an automated "fiber-optic plug-and-unplug machine." It contains hundreds of micro-mirrors that can rotate in three dimensions. Once a device in a certain rack is found to be faulty, the software can manipulate motors to deflect the mirrors, instantly bypassing the broken node and swapping in a spare, restoring the entire system in seconds.

Another pain point ignored by most is "system balance." Many people buy accelerators looking only at how many TFLOPs they have, but with the rise of sparse computation like Mixture of Experts (MoE), compute itself is actually not the hardest part; the hardest part is how to feed data to that compute. If the HBM (High Bandwidth Memory) isn't fast enough and the network isn't wide enough, the compute will just sit there waiting. This is why current hardware utilization (MFU) is generally miserably low. In the future, hardware "specialization" will become increasingly prominent. This is exactly why Google introduced the 8i TPU specifically for inference and the 8t TPU for training—because inference requires frequently fetching different data, while training requires massive synchronous computation, and the ratios of memory, network, and compute for the two are completely different.

Finally, Amin Vahdat reminds us not to fall into a "do-or-die" zero-sum game mindset. Even with algorithmic breakthroughs (like the transition from LSTM to Transformer, which boosted efficiency by 5x), hardware will never be in surplus. Because humanity's thirst for intelligence is infinite, any saved compute will immediately be filled by newer, more valuable applications (such as more complex agent collaboration). In this process, the real bottleneck is shifting from chip manufacturing to deeper physical constraints—namely, energy. How to efficiently acquire and schedule green energy, and make data centers act as "batteries" for local power grids (actively curtailing power and disconnecting during peak residential usage, and absorbing excess power during off-peak hours), will be the most central infrastructure challenge of the next decade.

Clips Worth Listening Closely

Resonances with past episodes

Tensions with past episodes

A faithful reconstruction and plain-language retelling of the episode, generated by PodLens.

This is one source-grounded reading, not a replacement for the original. Every point is anchored to its source, so you can check it yourself — and corrections are welcome.