AI Data Center Cooling Costs Reshape $700 Billion Infrastructure Boom

Hyperscale cloud and AI providers are committing over $700 billion in 2026 to expand data center capacity, MSN reported on May 3, citing industry capex forecasts. The figure captures more than server racks and concrete: it includes the cost of keeping chips from melting. At current rack densities, cooling alone can consume 30 to 40 percent of a data center's total energy budget. For facilities running Nvidia H100 or B200 clusters at scale, the difference between air cooling and direct-to-chip liquid cooling is not a marginal optimization. It is the difference between whether a $40,000 GPU operates at its rated thermal design power or throttles to protect itself, burning capital while delivering fewer tokens per second.

The data center cooling market is projected to reach $46.3 billion by 2033, expanding at a compound annual growth rate of 19.2 percent, according to Persistence Market Research, which published updated figures on April 28. That growth rate is not driven by incremental demand from traditional enterprise colocation. It is driven by AI training clusters where a single rack can draw 40 to 70 kilowatts, and next-generation deployments are pushing past 100 kilowatts per rack. Air cooling, which dominated data center design for three decades, tops out around 20 to 25 kilowatts per rack before supplemental techniques become mandatory. The thermal math imposes a hard ceiling.

The transition is visible in supply-chain data. Vertiv Holdings, one of the largest providers of power and cooling infrastructure for data centers, reported a $15 billion order backlog and a 252 percent surge in fourth-quarter orders, according to Seeking Alpha in April. The backlog is not speculative. It represents signed contracts for equipment that hyperscalers and neoclouds have committed to deploy, much of it liquid cooling infrastructure that did not exist in commercial volumes three years ago. The lead times for some direct-to-chip cold plates now stretch past 26 weeks, according to industry estimates.

The rack-density escalation has been exceptionally fast. A standard enterprise rack five years ago drew perhaps 5 to 8 kilowatts. An H100 training cluster routinely draws 40 to 50 kilowatts per rack, and Nvidia's GB200 NVL72 configuration, which packs 72 Blackwell GPUs into a single rack-scale system, can draw over 120 kilowatts. At that density, traditional hot-aisle and cold-aisle containment with computer room air conditioning units is physically inadequate. The required airflow volume would demand fans so large and loud that they would violate noise ordinances in most jurisdictions, and the energy overhead of moving that much air would erase the efficiency gains of the GPUs themselves.

Liquid cooling splits into three architectures, each with different economics. Direct-to-chip liquid cooling places cold plates on GPUs and CPUs, removing 70 to 80 percent of the heat at the source. Immersion cooling submerges entire servers in dielectric fluid, capturing over 95 percent of heat but requiring purpose-built hardware and fluid management systems that add capital cost. Rear-door heat exchangers sit on the back of the rack as a retrofit solution, handling 30 to 60 kilowatts per rack. The emerging consensus among hyperscalers, visible in procurement patterns, favors direct-to-chip for new AI deployments, with immersion reserved for the highest-density training clusters.

The energy consumption picture is starker still at the grid level. Data centers are projected to consume up to 12 percent of total U.S. electricity by 2028, Tech Xplore reported on April 27, up from roughly 4 percent in 2023. A single gigawatt-scale AI campus, of which several are now under construction in Virginia, Texas, and Ohio, can consume as much power as a mid-sized city. 24/7 Wall St. reported on May 4 that data center veteran John Perella described a near-miss in Virginia where gigawatt-scale AI buildouts came close to triggering rolling blackouts during peak summer demand.

Power availability, not chip supply, is now the binding constraint on AI infrastructure expansion. Business Insider reported on April 14 that California startup Orbital, founded on the premise that "the bottleneck is no longer chips, it's the power required to run them," has set a date for its first test mission to place AI data centers in low Earth orbit. The company raised funding from Andreessen Horowitz's speedrun accelerator, according to Forbes. The orbital thesis is straightforward: space offers unlimited solar power with no grid interconnection queue, and the vacuum of space provides passive radiative cooling at effectively zero marginal cost.

Google and SpaceX are pursuing parallel orbital strategies. The Logical Indian reported on April 26 that both companies are exploring orbital data centers to address AI's exploding energy and infrastructure demands. SpaceX's confidential S-1 filing ahead of a June 2026 IPO reportedly references challenges facing orbital AI data centers, MSN reported on May 11. Separately, Space.com reported on May 13 that Cowboy Space Corporation raised $275 million to build rockets specifically designed to launch orbital data center payloads. The first test missions are targeting 2028, though timelines for commercial-scale orbital compute remain speculative.

Meanwhile, the terrestrial response is bifurcating between hyperscale buildouts and efficiency retrofits. Hitachi Vantara released its FY2025 sustainability report on April 22, detailing expanded lifecycle design initiatives, governance improvements, and energy-efficient systems purpose-built for AI workloads. The report reflects a growing recognition among infrastructure vendors that sustainability disclosures are becoming procurement gatekeepers: hyperscale RFPs now routinely require detailed power usage effectiveness commitments, with penalties for exceeding contracted thresholds.

On the deployment side, Yahoo Finance reported on May 10 that Dell Technologies and Nvidia have been selected by TotalEnergies to design and install Pangea 5, a new high-performance computing system. The deal signals that energy companies themselves, whose core product is the input constraint on AI growth, are investing heavily in AI compute. TotalEnergies reportedly plans to use the system for seismic imaging, reservoir simulation, and emissions monitoring, workloads that require both massive GPU density and the cooling systems to sustain them.

The cooling innovation pipeline extends beyond established vendors. PyroDelta Energy, a subsidiary of First Tellurium Corp., has developed thermoelectric technology that converts data center waste heat back into electricity, Goldseiten reported on May 7, citing coverage in IEEE Spectrum magazine. The technology remains pre-commercial, but the principle addresses the fundamental thermodynamic problem of AI data centers: every watt consumed by a GPU becomes heat that must be removed, and every watt spent on removing heat is a watt not spent on compute. Closing even a small fraction of that loop with thermoelectric recovery would change the per-token cost equation.

For enterprises outside the hyperscale tier, energy cost volatility is becoming a boardroom concern. ET CIO reported on May 13 that CFOs and CIOs across Indian enterprises are increasingly aligning on energy analytics, cloud optimization, and operational efficiency initiatives aimed at protecting margins in an inflationary environment. The article notes that energy costs now rank among the top three variables in cloud workload placement decisions, alongside latency and data sovereignty. This is a structural shift: five years ago, energy was a rounding error in most enterprise IT budgets.

The per-token economics are where the cooling conversation gets granular. An inference call to a large language model running on H100 GPUs at batch size 1 consumes significantly more energy per token than the same call at batch size 32, because the GPU's fixed power draw is amortized across fewer output tokens. The cooling energy overhead compounds this inefficiency. A facility with a power usage effectiveness ratio of 1.4 consumes 0.4 watts of overhead for every watt of compute. At a PUE of 1.1, achievable with warm-water direct-to-chip cooling, the overhead drops to 0.1 watts. For a model provider serving millions of inference requests daily, that delta scales into millions of dollars in annual energy costs.

What remains absent from the public record is a standardized per-token energy cost disclosure from major model providers. No frontier lab currently publishes the average kilowatt-hours consumed per million tokens of output, broken down by model size, batch size, and hardware configuration. Without that data, the market cannot price the energy risk embedded in inference contracts. A customer buying tokens from one provider at $15 per million tokens and another at $12 per million tokens has no way to assess whether the cheaper provider is operating more efficient infrastructure or simply eating margin that will be clawed back in future price increases when energy costs rise.

The cooling market's trajectory will be determined by three variables in 2026 and 2027. First, whether the major GPU vendors ship next-generation chips that run cooler, reducing the thermal burden at the source. Second, whether utility interconnection timelines improve or deteriorate, which determines how many gigawatt-scale campuses actually reach commercial operation before 2028. Third, whether the per-token price of inference continues its current decline or plateaus under the weight of energy and cooling costs that cannot be engineered away. On that third variable, the public record is silent. No major cloud provider has published a forward curve for per-token pricing that separates compute costs from energy costs. That silence is worth watching.

Read next

Anthropic's SpaceX Compute Deal Caps Radical AI Partnership Resets

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.