145 TWh: Data Center Cooling Costs Surpass the Chips It Cools

Data centers consumed 485 terawatt-hours of electricity in 2025. Of that total, 30 percent, roughly 145 TWh, went to cooling alone. That single line item exceeds the entire annual electricity consumption of Sweden. The number comes from an analysis by New Atlas, published on May 21, and it is the cleanest summary yet of a problem the AI industry spent most of the decade ignoring: every token a model generates has a thermal shadow, and the air conditioners that once dispersed it are now the bottleneck.

The physics is straightforward. A single NVIDIA H100 GPU draws 700 watts at full load. An eight-GPU HGX baseboard pulls north of 5.6 kilowatts. A rack packed with Blackwell B200 GPUs, which ship at 1,000 watts per chip, reaches 120 kilowatts and beyond. At those densities, air cooling fails. Network World reported in March that the data center's 40-year reliance on air has "hit a physical limit." The industry is now mid-pivot toward direct-to-chip liquid cooling and immersion systems, a capital expenditure cycle that will reshape who builds data centers, where they get built, and what a kilowatt-hour of inference actually costs.

The speed of the transition is visible in procurement calendars. Supermicro and Verda announced on May 27 a deployment of liquid-cooled NVIDIA Blackwell-accelerated systems for what they call a "vertically integrated, full-stack AI cloud." The press release uses the word "sustainable" twice in the headline, a signal that every hyperscaler now markets cooling efficiency as a competitive differentiator rather than a facilities footnote. Five years ago, liquid cooling was exotic. In 2026, an air-cooled Blackwell deployment would be an engineering curiosity.

The thermal energy storage market for AI data centers is growing alongside it. A global market analysis published by Yahoo Finance on May 20 projects the sector at more than $4.5 billion, driven by hyperscale expansion and the need for sustainable energy management. The report explicitly ties the opportunity to "rising demand for efficient cooling solutions due to AI workloads." Thermal storage, the practice of capturing excess heat or cold for later use, is no longer a green checkbox. It is becoming a cost-line item that determines whether a given campus can operate 24 hours a day at the power densities that AI training requires.

Schneider Electric's phased delivery of more than $290 million in AI infrastructure to TeraWulf's Lake Mariner campus in upstate New York, reported by TMCnet on May 26, includes Motivair liquid cooling technologies. The deal is backed by Google and sits on a site originally developed for Bitcoin mining. The migration of compute from proof-of-work hashing to transformer training runs on the same power infrastructure is a pattern repeating across Texas, North Dakota, and the hydropower corridors of the Pacific Northwest. The grid connection and the cooling loop are the new moat; the chip SKU inside the rack is almost secondary.

Then there is Europe. CNBC reported on May 18 that high energy prices could derail the continent's AI ambitions just as it tries to compete with the U.S. and China. Electricity costs vary widely: France, Portugal, and Spain have weekly average prices below €95 per megawatt-hour, according to pv magazine, while Germany and much of Northern Europe sit above that threshold. For a 100-megawatt data center running at 85 percent utilization, a €30/MWh differential translates to roughly €22 million per year. That single variable can flip a campus from net positive to net negative before a single GPU is racked.

Denmark is the cautionary tale. Its grid operator has warned that data center demand is overwhelming transmission capacity, and the government is considering limits on further buildout, CNBC reported. The Nordic countries were supposed to be Europe's compute haven: cold ambient air, abundant hydropower, and political stability. But the per-megawatt math that penciled out for 15-kilowatt racks breaks down when every cabinet pulls 100 kilowatts and needs a closed-loop liquid system regardless of latitude.

The United States faces its own version of the same arithmetic. The Energy Information Administration said in its May Short-Term Energy Outlook that U.S. power consumption, which hit record highs in 2025, will rise further in 2026 and 2027, with AI data centers named as a primary driver, Reuters reported. The grid interconnection queue for new large-load customers now stretches beyond five years in Northern Virginia, the densest data center market on the planet. Developers are filing utility connection requests before they have secured the land.

Cooling is not the only fluid moving through these facilities. A single data center in Spartanburg, South Carolina, estimates it will use 460,000 gallons of water per day, according to The Post and Courier, almost 2,000 times the consumption of an average residential customer. In California, data centers have increased their water draw by 96 percent since 2019, often siting in drought-stressed communities while bypassing the kind of environmental review applied to other industrial users. The opacity is not accidental; disclosure rules vary by state, and many operators treat water consumption figures as proprietary.

The industry response has been to market closed-loop liquid cooling as a water-saving technology. Direct-to-chip systems recirculate coolant through a sealed loop that dissipates heat via an external dry cooler or cooling tower. The water draw per megawatt drops sharply compared to evaporative air cooling. But the capital cost per rack rises, and the operational complexity increases: a leak in a $250,000 GPU tray is not the kind of incident any facility manager wants to explain to an insurer. The risk profile of the asset class is shifting along with the thermal envelope.

What none of the press releases answer is the per-token question. A training run on a cluster of 10,000 H100s over 90 days at a PUE of 1.1 burns roughly 16.6 gigawatt-hours. That figure is increasingly well understood. The inference side is murkier. A single query to a large reasoning model may draw 500 to 1,500 tokens of output at variable batch sizes, and the energy cost per token depends on whether the serving cluster is running at batch size 32, where throughput per watt peaks, or batch size 1, where latency is minimized but the GPUs sit partially idle. Two deployments of identical hardware serving the same model can have per-token energy costs that differ by a factor of five.

The public record does not contain a reliable, model-level breakdown of inference energy cost per token across the major providers. OpenAI, Anthropic, Google DeepMind, and Meta do not disclose granular energy metrics per API call. What exists are academic estimates and third-party extrapolations that anyone in procurement treats as directional at best. The absence of this data point is not a footnote; it is the entire question of who captures the margin in the compute stack. If a model provider charges $15 per million output tokens and the electricity-to-cooling overhead on those tokens costs $2, the chip rent is $5, and the remaining $8 goes to software and margin, every fractional improvement in cooling efficiency drops almost directly to the bottom line.

The Yahoo Finance market report on the AI supercomputing platform market, published May 27, frames the competition as one between NVIDIA's integrated systems and hyperscaler custom silicon. Missing from that framing is the question of who owns the cooling stack. A vertically integrated vendor like Supermicro, which manufactures the chassis, the liquid cooling distribution manifold, and the rack-level plumbing, captures margin at each layer. A hyperscaler that designs its own server boards but buys third-party cooling components may have lower upfront procurement costs and higher mid-life maintenance overhead. The total cost of ownership calculations are different enough between these approaches that no single benchmark can resolve them.

The geographic reshuffling is already underway. Northern Virginia's queue is pushing developers toward Ohio, Indiana, and the Carolinas. Europe's data center capacity is migrating from Frankfurt toward the Nordics and the Iberian Peninsula, where renewable penetration keeps spot electricity prices lower. Latin America and Southeast Asia are seeing early-stage hyperscale interest, constrained mainly by the availability of 100-plus megawatt grid connections and the skilled labor to maintain liquid-cooled infrastructure. The map of AI compute in 2030 will look less like the internet backbone map of 2010 and more like the aluminum smelting map of 1950: power-price arbitrage with a compute overlay.

One structural uncertainty hangs over all of this: the speed at which inference moves from cloud to device. Apple's on-device models, Qualcomm's Snapdragon AI Engine, and the growing capability of sub-10-billion-parameter models mean that a nontrivial fraction of consumer-facing inference cycles may never touch a data center cooling loop at all. If on-device inference captures 40 percent of total token generation by 2028, the demand curve for data center cooling shifts from exponential to something closer to linear. The $4.5 billion thermal storage market and the $290 million campus-level cooling investments are bets that the cloud remains the default location for inference, a thesis that is not yet falsified but is also not guaranteed.

In the near term, the checkpoints are specific and observable. Watch for the next quarterly filings from Carrier Global, which reported a 35 percent rise in global Commercial HVAC orders in Q1 2026 and explicitly cited AI data center cooling as a growth driver. Watch for the EIA's midyear update to its power consumption forecast. And watch for the first major hyperscaler to break out cooling cost as a separate line item in its infrastructure spend, a disclosure that would allow the market to price the per-token cooling margin for the first time. Until that number is public, everyone in the stack is guessing.

Read next

Enterprise AI Drives Restructuring at OpenAI, Anthropic, DeepMind

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.