AI Data Centers Consume 12% of U.S. Power, Cooling Costs Surge

Data centers will consume up to 12 percent of total United States electricity by 2028, according to an estimate published by Tech Xplore on April 27, 2026, citing research that tracks the explosive growth of artificial intelligence workloads. The number is a tripling from just four years prior. For the compute economics beat, the figure reframes every conversation about inference pricing: energy is no longer a line item buried in a hyperscaler’s annual report. It is the largest variable cost in the per-token supply chain, and it is rising faster than GPU depreciation schedules can offset it.

The scale of capital flowing into AI data centers in 2026 is difficult to overstate. Hyperscale cloud and AI providers are committing over $700 billion this year alone to expand data center capacity, MSN reported on May 2, citing industry data. J.P. Morgan projects that top U.S. cloud providers will increase AI data center capital spending by more than $200 billion in 2026, marking the largest annual increase to date, according to the same outlet. Utility crews, underground infrastructure projects, and smart equipment demand are growing in lockstep, creating what the Coeur d’Alene Press described on May 8 as a massive construction boom.

Buried inside that $700 billion figure is a cooling market that is quietly undergoing its own revolution. The global data center cooling market is expected to be valued at approximately $13.6 billion in 2026 and is projected to reach $46.3 billion by 2033, a compound annual growth rate of 19.2 percent, according to a report from Persistence Market Research published on April 28 and syndicated by FinanzNachrichten.de. That growth rate outpaces even the bullish forecasts for GPU shipments. The reason is straightforward: traditional air-cooled racks top out around 20 to 30 kilowatts per rack. AI training clusters routinely exceed 60 kilowatts per rack. A single Nvidia GB200 NVL72 rack can draw over 120 kilowatts. Air cannot move that much heat. Liquid must step in.

The shift from air to liquid cooling is the single largest infrastructure transformation in data center engineering since the virtualization era. Companies like CoolIT Systems, acquired by Ecolab in March 2026, and PyroDelta are racing to deploy direct-to-chip and immersion cooling solutions at scale, according to MSN. At Data Center World 2026, industry leaders showcased designs for megawatt-scale racks with integrated cooling and power systems, reflecting a fundamental redesign prompted by generative and agentic AI, MSN reported on April 27.

The physics driving this transition is unforgiving. Power Usage Effectiveness, or PUE, is the ratio of total facility energy to the energy consumed by IT equipment. A PUE of 1.0 would mean every watt entering the building reaches a chip. No real data center achieves that. In 2007, the industry average was approximately 2.0, meaning half of all power was wasted on cooling, lighting, and conversion losses. Hyperscalers like Google and Microsoft now report annualized PUE figures around 1.10 to 1.12 for their most efficient facilities. But those numbers are for cloud-era workloads with predictable, moderate-density racks. AI clusters push PUE upward because liquid cooling pumps, coolant distribution units, and higher power densities introduce new overhead. The cooling infrastructure itself consumes power. At rack densities above 60 kilowatts, every 0.01 of PUE degradation translates to tens of thousands of dollars in additional annual electricity cost per rack.

The water dimension is equally material and less widely discussed. Data centers use evaporative cooling to reject heat. In Wisconsin, where Microsoft is expanding its hyperscale footprint, the Milwaukee Journal Sentinel reported on April 9 that the data center boom is raising concerns about freshwater supply, with facilities requiring enormous amounts of water for cooling alongside electricity. In Florida, USA Today reported on April 22 that data center projects are facing community pushback over the strain they place on local water and energy resources. A midsize hyperscale data center can consume one million to five million gallons of water per day for cooling, depending on climate and cooling architecture. Liquid cooling, particularly closed-loop direct-to-chip systems, reduces water consumption relative to evaporative air cooling. The tradeoff is higher capital cost and more complex maintenance.

Until recently, the conversation around AI growth has focused almost exclusively on compute and GPU availability. But the real constraint emerging now is not about silicon. It is about power., Forbes Business Council, March 26, 2026

That constraint is already pricing into the market in visible ways. Core Scientific, a major bitcoin mining operator pivoting to AI hosting, is preparing to raise $3.3 billion through a junk bond sale to fund its transition toward AI-focused data center operations, CoinDesk reported on April 21. The company is betting that AI customers will pay a premium for ready-to-occupy, high-density, liquid-cooled capacity. That bet is almost certainly correct. The time to build a new hyperscale data center from greenfield to commissioning now exceeds 24 months in most major markets. In Northern Virginia, the densest cluster of data centers on Earth, utility Dominion Energy has a queue of facilities waiting years for grid connections. Supply is inelastic in the medium term. Prices for colocation in a liquid-cooled AI-capable facility are rising faster than GPU cloud instances.

For the per-token economy, what matters is how these infrastructure costs trickle down to inference pricing. The arithmetic is not as simple as dividing a facility's monthly power bill by tokens served. Inference workloads are spiky. A model serving API calls at 3 a.m. draws far less power than the same cluster handling a peak traffic burst at noon. Cooling systems must be sized for peak load, but paid for across average utilization. The gap between peak and average is where margins are made or lost. Cloud providers have spent a decade optimizing general-purpose compute for average utilization rates around 50 to 60 percent. AI inference clusters, especially those serving agentic workloads with unpredictable burst patterns, often run at 30 to 40 percent GPU utilization while drawing near-peak power. That is an expensive mismatch.

The mismatch shows up in the hardware choices providers are making. Nvidia's latest MLPerf inference submissions, analyzed by The Next Platform on April 2, show Blackwell-based systems delivering dramatically higher tokens-per-second-per-watt than prior generations. But the absolute power draw of these systems is also higher. A fully populated GB200 NVL72 rack can exceed 130 kilowatts at peak load. The efficiency gains come from doing more work per watt, not from drawing fewer watts. For a model provider paying $0.08 to $0.12 per kilowatt-hour at industrial rates, the energy cost per million tokens served is now measurable in cents, not fractions of cents. For a 70-billion-parameter model running inference at scale, energy alone can account for 15 to 25 percent of the fully loaded cost per token, depending on batch size, sequence length, and hardware generation. Cooling adds another 5 to 15 percent on top of the IT power cost, depending on PUE and cooling architecture.

The Persistence Market Research data on the cooling market, syndicated by TMCnet on April 28, breaks down the market by technology: liquid cooling solutions, including direct-to-chip and immersion, are growing faster than the overall 19.2 percent rate. Air conditioning and air handling solutions still hold the largest revenue share today but are losing ground. The report highlights that AI-driven workloads, with their sustained high GPU utilization and dense packaging, are the primary driver of liquid cooling adoption. The same report notes that North America holds the largest market share, driven by the concentration of hyperscale data center construction in the United States.

There is a geographic wrinkle worth tracking. Data center construction in 2026 is not spreading evenly. The Coeur d'Alene Press report on May 8 emphasized that the construction boom is pulling utility crews and underground infrastructure contractors into markets that lack the skilled labor to keep pace. Sites in Ohio, Texas, Arizona, and the Pacific Northwest are competing for the same limited pool of electrical engineers, pipefitters, and commissioning agents. Labor constraints feed back into cooling costs: a delayed liquid cooling installation can push a facility's commissioning date by months, which in turn delays the revenue that justifies the capital outlay. The financing math on a $1 billion data center does not tolerate a six-month delay in cooling system deployment.

What This Means For The Per-Token Stack

The question that matters for inference economics is how these costs flow through the supply chain. A chip designer like Nvidia captures margin on the GPU. A cloud provider captures margin on the compute instance. But energy and cooling costs sit between them, and neither player fully controls them. A model provider like Anthropic or OpenAI running inference on rented capacity pays for energy indirectly through cloud pricing. A self-hosted deployer pays the utility bill directly. In both cases, energy is the least negotiable line item. You cannot get a volume discount on electrons the way you can on reserved instances.

The market is responding with vertical integration. Dell Technologies and Nvidia were chosen by TotalEnergies to design and install Pangea 5, a new high-performance computing installation, Yahoo Finance reported on May 10. The deal signals that energy companies want to own the full stack from generation to GPU, and that hardware vendors are willing to co-design around power and cooling constraints rather than treating them as afterthoughts. TotalEnergies is not a cloud provider. It is an energy company that has decided that AI compute is a power business with a software problem attached, rather than the reverse.

For the customers at the end of the chain, the implication is that per-token pricing has a floor set by physics, not by competitive dynamics. Even if GPU prices fall and model architectures become more efficient, the energy cost of running a forward pass through a transformer does not approach zero. Every token requires a certain number of floating-point operations. Every floating-point operation dissipates heat. Every watt of heat must be removed. The cooling market growing at 19.2 percent annually is a proxy for how much heat the industry expects to generate. It is also a proxy for how much cost will flow through the infrastructure layer before it ever reaches a customer invoice.

Checkpoint: monitor the PUE disclosures from the major hyperscalers in their next quarterly filings. If AI-specific facilities start reporting PUE figures above 1.15, that is a signal that rack densities are outpacing cooling efficiency and that per-token energy costs are trending higher, not lower. The second checkpoint is water usage. Community opposition in Wisconsin and Florida may not stop data center construction, but it will raise permitting timelines and costs. Longer timelines mean tighter supply. Tighter supply means the $46.3 billion cooling market projection for 2033 may prove conservative.

What This Means For The Per-Token Stack

Read next

Inference Economics Takes Over Neocloud War in $643M Eigen Deal

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.