565 TWh and Rising: AI Cooling Crisis Spurs 9.2B Liquid Cooling Race

On June 12, research firm Gartner dropped a number that reshapes every conversation about AI infrastructure: global data centre electricity consumption will reach 565 terawatt-hours in 2026, a 26% increase over 2025, as Energy Digital reported. For scale, 565 TWh is more than the annual electricity consumption of France. The forecast frames the question that everyone from chip designers to utility regulators now has to answer: can the grid and the cooling plant keep pace with the GPU?

Not all of that 565 TWh belongs to AI. But a growing and disproportionate share does. Training runs for frontier models routinely consume tens of thousands of GPU-hours across clusters drawing multiple megawatts. Inference, once dismissed as lightweight, now runs 24/7 at scale across millions of tokens per second, with each H100 pulling upwards of 700 watts at peak load. The power draw of a single rack of eight H100s exceeds 10 kW, at the outer edge of what traditional air cooling can handle before the facility hits thermal saturation.

That is where the market data gets blunt. Persistence Market Research valued the global data centre liquid cooling market at approximately $5.7 billion in 2026 and projects it will reach $29.2 billion by 2033, a compound annual growth rate of roughly 26%. Grand View Research, publishing separately the same week, pegged the figure at $29.5 billion by 2033 with a 20.1% CAGR, as TMCnet reported. The two firms differ on the exact multiplier but agree on the shape of the curve: exponential, and not slowing.

The reason for the consensus is physics. Nvidia's B200 GPU, the successor to the H100, reportedly pushes per-socket thermal design power well past 1,000 watts. A rack populated with B200s could exceed 40 kW of heat dissipation. Air cooling, even with hot-aisle containment and raised-floor optimisation, tops out between 20 and 35 kW per rack in most retrofit scenarios. Liquid cooling, whether direct-to-chip cold plates or immersion, absorbs heat roughly 3,000 times more efficiently than air on a per-volume basis. For AI clusters, the arithmetic is settled.

The cooling market breakdown shows direct-to-chip solutions holding the largest revenue share in 2026, favoured by hyperscalers who need to retrofit existing air-cooled facilities without a full teardown. Immersion cooling, where servers are submerged in dielectric fluid, is growing faster from a smaller base, with Persistence Market Research noting particular demand in greenfield data centre projects where developers can design the tank infrastructure from the slab up. Rear-door heat exchangers, long a staple of enterprise data centres, are finding a second life as a bridge technology for AI inference clusters that run at lower utilisation than training.

At Computex Taipei 2026, Seeking Alpha reported that Dow showcased a full portfolio of thermal management materials aimed specifically at AI data centres: dielectric cooling fluids, thermal interface materials for GPU-to-cold-plate contact, and advanced packaging compounds designed for chiplet architectures where thermal density concentrates in small die areas. The presence of a materials giant at a computing trade show signals how far downstream the AI cooling problem has travelled. It is no longer a facility design question. It is a chemical engineering question.

The power side of the equation is equally strained. Gartner explicitly noted that power availability is "now impacting growth," a phrase that signals a shift from forecasting to constraint. Grid interconnection queues in key data centre markets, particularly Northern Virginia and Dublin, stretch beyond five years. Utilities in Arizona, Texas, and the Pacific Northwest are receiving interconnection requests for single-site loads exceeding 500 MW, the equivalent of a medium-sized steel mill or aluminium smelter. The datacentre industry's appetite is beginning to compete with industrial incumbents for the same transformer and transmission capacity.

The environmental dimension extends beyond electricity. eWeek reported on a United Nations University study warning that AI data centres' footprint now encompasses water consumption, land use, and disclosure gaps alongside energy. The same Reuters dispatch, syndicated through the AP, noted that data centres are projected to consume twice as much power and water by 2030. Water enters the equation primarily through evaporative cooling towers, which can consume millions of gallons annually per large facility. A mid-sized AI training cluster using air-cooled chillers supplemented by evaporation can draw over 300,000 gallons per day during peak summer months. Liquid cooling changes the water calculus: closed-loop direct-to-chip systems reduce or eliminate evaporative losses, but immersion cooling tanks require large initial fills of dielectric fluid, shifting the environmental burden from ongoing water consumption to chemical manufacturing and disposal.

Amid the flood of market projections and environmental warnings, the infrastructure industry has moved to establish a shared baseline. On June 10, the National Electrical Manufacturers Association, ASHRAE, and the Pacific Northwest National Laboratory jointly released the AI Data Center Energy Performance Framework, as Contractor Magazine reported. The framework covers design, development, and operational phases, with specific guidance on power usage effectiveness targets for AI workloads, thermal management standards for liquid-cooled racks, and resilience metrics for facilities where compute availability directly governs revenue.

At a time when the market is coalescing around the value of 'speed to power' and increasingly pursuing innovative pathways to project energization, the AI Data Center Energy Performance Framework provides an authoritative set of resources for developers and facility managers., NEMA, ASHRAE, PNNL joint statement, June 2026

The framework matters because the status quo, each hyperscaler and colocation provider defining its own cooling KPIs, makes it impossible to compare claims across facilities. One operator's "liquid-cooled AI data centre" might mean direct-to-chip on 20% of racks with the balance on air. Another's might mean full-immersion. Without a common language, the market cannot price cooling risk, and customers cannot diligence it. The NEMA-ASHRAE-PNNL framework is not a regulation, but it creates the taxonomy that regulation and procurement contracts will eventually reference.

The Forbes Tech Council piece published two days later reinforced the same gap between perception and reality, arguing that most public discussion of AI data centres lacks visibility into how facilities actually operate and what AI workloads genuinely require. The article noted that training and inference have fundamentally different cooling profiles: training runs are batch-oriented, predictable, and concentrated in time, while inference is continuous, latency-sensitive, and geographically distributed. Cooling infrastructure that optimises for one may be wasteful for the other.

This distinction has investment implications. A liquid cooling system sized for peak training load may run at 30% utilisation during inference-dominant periods, dragging down the facility's overall power usage effectiveness. Hyperscalers with diversified workloads can smooth this curve. Smaller AI cloud providers and neoclouds, whose revenue depends on whichever workload their customers submit, face lumpier cooling demand and higher per-kilowatt costs. The Persistence Market Research data captures total addressable market but does not disaggregate by workload type, a gap that procurement teams are beginning to notice.

On the materials side, the Dow presentation at Computex highlighted a less visible but critical constraint: thermal interface materials, or TIMs. These are the compounds sandwiched between a GPU die and the cold plate that draws heat into the liquid loop. As die sizes shrink and power densities rise, the B200's compute die is reportedly smaller than the H100's despite higher thermal output, the heat flux per square millimetre climbs to levels that challenge even high-performance TIMs. Dow's pitch was that advanced silicone-based and hybrid materials can sustain thermal conductivity above 10 watts per metre-Kelvin while surviving the thermal cycling that comes from GPUs repeatedly ramping from idle to full load and back. It is a specification sheet detail, but one that sets the ceiling on how dense a rack can get before the cold plate stops being able to pull heat out fast enough.

For the broader compute market, the cooling cost is beginning to show up in per-token economics. While most discussion of inference pricing focuses on GPU SKU and batch size, cooling represents a meaningful share of total cost of ownership at scale. A back-of-the-envelope calculation: a data centre with a PUE of 1.4 spends 40% of its total energy on non-compute loads, primarily cooling. Liquid cooling can drive PUE below 1.1 in optimal configurations, shifting that 30 percentage points of overhead into compute. For an inference provider billing per million tokens, that delta directly widens gross margin. The providers who deploy liquid cooling first capture a structural cost advantage that is invisible to most API customers but decisive at the margin.

The grid side is where the numbers get hardest. The IEA projects data centre electricity demand could reach 945 TWh annually by 2030, as New Scientist reported. That figure assumes continued AI adoption at current growth rates and does not account for a step change in model scale or inference volume. At 945 TWh, data centres would consume roughly 3% of global electricity, up from an estimated 1-1.5% in 2024, according to the United Nations University report summarised by Tech Xplore. Three percent may sound manageable, but it concentrates in specific geographies: Northern Virginia alone hosts over 300 data centres, and the local utility Dominion Energy has repeatedly revised its load forecasts upward. In Ireland, data centres accounted for 21% of all metered electricity consumption in 2024, a figure the national grid operator expects to exceed 30% by 2028.

What the framework actually changes

The NEMA-ASHRAE-PNNL framework is most specific in its treatment of partial-load efficiency. AI data centres, unlike traditional colocation facilities, rarely run at steady state. Training jobs spike power draw to near-maximum for days or weeks, then drop to near-idle. The framework introduces a metric called Cooling System Effectiveness Ratio that requires measurement across at least three load bands, 25%, 50%, and 100% of design capacity, rather than a single snapshot at peak. This is an acknowledgement that the old PUE benchmark, captured at whatever moment a facility happens to measure it, conceals the inefficiency that emerges when AI workloads fluctuate.

The framework also addresses water, incorporating a water usage effectiveness metric alongside the existing power metric and recommending closed-loop cooling where local water stress exceeds a defined threshold. This aligns with the UN University's finding that water consumption by data centres is rising faster than energy consumption in some regions, a pattern that is especially acute in drought-prone data centre clusters like Phoenix, Arizona, and central Spain.

What to watch

Three checkpoints will determine whether the cooling industry's projections hold. First, the B200 ramp. If Nvidia's next-generation GPU ships at volume in late 2026 as expected, the per-rack thermal ceiling jumps overnight, and every air-cooled facility hosting AI workloads faces a forced migration decision. Second, the interconnection queue. Utility regulators in Virginia and Texas are under pressure to accelerate grid connections for data centre projects; if they do not, the bottleneck will shift from cooling technology to power delivery, and the $29.2 billion liquid cooling market forecast will prove optimistic simply because the facilities it presupposes will not get built. Third, the inference shift. If models continue shrinking and moving to device, Apple's on-device large language model strategy is one signal, the centre of gravity for AI compute may shift from a few thousand hyperscale training clusters to millions of distributed inference endpoints, each with modest cooling needs. In that scenario, the liquid cooling market still grows, but its composition tilts from immersion tanks toward compact cold-plate modules that fit inside edge micro-data-centres.

The 565 TWh figure from Gartner is a headline number. But like most energy statistics, it is an average. The real story, visible in the cooling market data and the framework documents and the materials science on display at Computex, is that AI's energy profile is becoming more concentrated, not less. A single rack of B200s will draw more power than an entire data centre hall from a decade ago. Cooling that rack, reliably and continuously, is not a facilities afterthought. It is the precondition for every inference call and training run that the AI industry has already sold.

565 TWh and Rising: AI Cooling Crisis Spurs $29.2B Liquid Cooling Race

What the framework actually changes

What to watch

Read next

What the framework actually changes

What to watch

Read next

Frontier AI Rules Rewritten After Mythos 5's 72-Hour Ban

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.