TechReaderDaily.com
TechReaderDaily
Live
Compute & Inference Economics

H100 Reserved Instance Prices Hit $2.35/Hour, Spot Spread Widens

Reserved H100 pricing surged 38% in six months, but the widening spread between spot and reserved GPU instances reveals a deeper supply-demand imbalance in the AI infrastructure market.

In this article
  1. The Forward Curve

As of March 2026, a one-year reserved H100 instance costs $2.35 per GPU-hour. That is the figure published by SemiAnalysis in early April, and it represents a 38% increase from $1.70 per GPU-hour in October 2025. The assumption set: Nvidia H100 SXM, 80 GB, on a one-year committed contract, bare-metal or equivalent, excluding the hyperscaler markup that AWS, Azure, and GCP layer on top. It is the wholesale price, and it is climbing faster than any quarter since the post-ChatGPT allocation panic of mid-2023. The number alone does not capture the shape of the market, but it is the number every CFO negotiating a renewal this quarter will use as the baseline.

What matters now is the spread between reserved and spot, and the widening gap between the price a neocloud quotes on its website and the price you actually pay when capacity is tight. "GPU pricing remains broken," wrote TechSpot in its Q2 2026 GPU pricing survey, noting that while prices have stopped their parabolic rise, they have not corrected downward either. A floor has formed, and it is a high floor.

The spot versus reserved dynamic in compute is not new. AWS invented the reserved instance in 2009. What is new is the scale of the premium and who controls the allocation. Reserved contracts, which require 12-month or 36-month commitments and upfront payment of 30% to 50%, now cover roughly 70% of the H100 and H200 capacity that moves through neocloud channels, according to figures shared by two infrastructure procurement leads at AI-native startups. Spot capacity, the remainder, is effectively auctioned in real time. The clearing price on a Tuesday morning for a single 8-GPU node with InfiniBand can be 45% higher than the same node reserved on a Friday evening, simply because a single large training run landed on the order book.

This is the environment in which enterprises are making infrastructure decisions, and the decisions are not rational in any classical sense. VentureBeat reported on April 30 that enterprise GPU fleets average 5% utilization. Not 5% above plan. Five percent total. The data, originally published by Cast AI in a report released April 22, examined thousands of enterprise GPU instances across AWS, Azure, and GCP. The finding is not that engineers are misconfiguring workloads. The finding is that enterprises are buying capacity they cannot schedule work onto, because the procurement cycle runs faster than the model-development cycle, and nobody wants to be the VP who cancelled the reservation the week the research team finally ships.

The FOMO loop works like this. A large enterprise signs a $40 million reserved-instance commitment for 1,000 H100s over three years. The commitment is made in Q1, after a six-month procurement process that began the previous summer, when H100 availability genuinely was scarce. By the time the instances are provisioned in Q2, the enterprise's AI team has shifted its roadmap to H200s for inference and is evaluating Blackwell for the next training run. The H100s sit idle. But the enterprise cannot cancel the reservation without paying a penalty that, in many contracts, equals 60% of the remaining commitment. So the GPUs stay reserved, stay idle, and subtract from the available supply that would otherwise push spot prices lower.

Meanwhile, on the other side of the market, prices keep climbing because demand from frontier labs has not slowed. Carmen Li, CEO of Silicon Data, told Business Insider in early April that prices are "going nuts." Li spent a decade at Bloomberg before founding Silicon Data to do for GPU compute what Bloomberg terminals did for financial data: publish real-time indexes, forward curves, and price transparency into a market that has historically operated on phone calls and NDAs. Her firm's indexes show Nvidia GPU rental prices rising by roughly 40% across the H100 and H200 product lines since late 2025, with Blackwell and Vera Rubin models marking up another 50% on top of that baseline.

The Forward Curve

On April 20, Silicon Data launched the first GPU Forward Curve, a dataset that lets an enterprise CFO see what a reserved H100 or H200 contract will cost not just today but six, twelve, and eighteen months out. The curve, as of early May 2026, slopes upward. A one-year H100 reservation priced at $2.35 per GPU-hour today projects to $2.65 by the end of 2026 and $2.95 by mid-2027, according to the curve's midpoint estimates. If those numbers hold, the three-year total cost of a 1,000-GPU cluster reserved today will come in 26% higher than the same cluster reserved in October 2025. That is a $19 million difference for a single mid-sized training cluster. The forward curve gives CFOs something they have never had before: a number to take to the board.

The forward curve also reveals something about the structure of the market that spot prices alone obscure. The curve is steeper for short-duration contracts, those under three months, than for 36-month commitments. A three-month H100 reservation in May 2026 costs $2.90 per GPU-hour, a 23% premium over the one-year rate. That premium reflects the risk that the provider cannot re-lease the capacity when the short contract expires. In a market where demand is growing 30% quarter over quarter, the risk of idle capacity is near zero, so the premium is a pure margin item for the provider, not a hedge. Short-duration buyers pay a convenience fee, and it is rising.

Who captures that margin? It depends on which layer of the stack you occupy. The hyperscalers (AWS, Azure, GCP) add a 15% to 35% management premium on top of the base GPU rental, depending on whether you buy their proprietary AI platform services or run bare metal. The neoclouds (CoreWeave, Lambda, Crusoe, Voltage Park) operate on thinner spreads, typically 8% to 12% over their own infrastructure cost, but they make it up in volume and in the spot-market surge pricing that their reserved-contract customers never see. The model providers (OpenAI, Anthropic, the frontier labs) capture margin on the inference side, where a token that costs them $0.000012 to generate on an H200 sells for $0.000150 to the end user, a 12.5x markup. The chip layer, Nvidia, captures roughly 65 cents of every dollar spent on AI compute infrastructure, a figure that has held remarkably steady across two product generations.

This margin structure matters because it determines how quickly the per-token price implied by each new GPU generation actually reaches the customer invoice. When Nvidia announces a 2x improvement in inference throughput on Blackwell versus H100, the first question to ask is: at what batch size? The 2x figure is typically measured at batch size 32, with a 2,048-token context, on a server configuration that a hyperscaler might use internally. At batch size 1, the regime that matters for chat applications where a single user query arrives asynchronously, the throughput gain is closer to 1.3x to 1.5x, depending on model architecture. The per-token price improvement the customer sees is therefore smaller than the press release implies, and it arrives six to nine months after the GPU ships, once the hyperscaler has amortized its deployment cost.

The number on the benchmark slide is not the number on your invoice. A Blackwell GPU running at batch size 32 in a lab in Santa Clara is not the same asset as a Blackwell GPU serving a production chatbot at batch size 1 in a Virginia datacenter with a 98.5% uptime SLA. The difference between those two numbers is where the cloud margin lives. That assessment comes from a neocloud infrastructure lead at a company with roughly 15,000 GPUs under management, speaking on condition that neither he nor his employer be named, because his pricing agreements include confidentiality clauses that his legal team actually enforces. The delta between lab performance and production performance, he said, is typically 40% on throughput and 25% on latency, and it is priced in.

The other question to ask about any new GPU announcement: how long until the per-token price it promises actually appears on a customer invoice? History provides a rough clock. The H100 began shipping in volume in Q1 2023. The per-token inference price on H100-based instances dropped to within 15% of the long-run equilibrium, roughly the GPU's amortized cost plus a 20% cloud margin, by Q4 2024, a lag of roughly 20 months. The H200, which began volume shipments in Q3 2024, reached its pricing equilibrium by Q4 2025, roughly 15 months. The Blackwell generation, shipping in volume as of Q1 2026, will likely reach invoice-level pricing equilibrium by Q2 2027, if the pattern holds. Customers who buy Blackwell capacity today are paying a novelty premium that will erode by roughly 3% per month over the next 18 months.

The 5% utilization figure from Cast AI is not just a curiosity. It is the mechanism by which the novelty premium persists. If every enterprise that reserved H100s in 2024 were actually using them, the spot market would have more slack, and spot prices would fall. Instead, idle reserved capacity acts as a supply sink. The GPUs are technically in the market but functionally unavailable, because the enterprise holding the reservation has no incentive or mechanism to sublet them. A secondary market for GPU reservations does exist: brokers in New York and London will match a buyer with excess capacity to a seller who needs it, but it is opaque, relationship-based, and carries a 10% to 15% transaction cost that discourages all but the most desperate buyers.

There is a case that the market is beginning to correct, but the correction is slow. An MSN report published May 7 noted that provider pricing ranges for top GPU models now span a factor of 2x from the cheapest to the most expensive listing, a level of dispersion that typically signals either market segmentation or market inefficiency. In a mature market, price dispersion narrows as buyers comparison-shop. In the GPU rental market, dispersion is widening because different providers are selling fundamentally different products under the same SKU name: one provider's H100 instance includes InfiniBand and runs in a tier-4 datacenter with 99.99% uptime, while another's runs on Ethernet in a converted warehouse with best-effort cooling. The buyer who does not read the fine print pays the same hourly rate for substantially less compute.

The number to watch between now and October 2026 is the one-year reserved H100 price. SemiAnalysis had it at $2.35 in March. If it crosses $2.70 by September, the forward curve from Silicon Data is validated, and the enterprise procurement cycle that locked in $1.70 contracts last October will look like the smartest trade of the cycle. If it drops below $2.00, the FOMO premium has broken, and a lot of idle capacity will suddenly become visible on the spot market. Either way, the spread between the benchmark slide and the invoice is not going away. That spread is the business model.

Read next

Progress 0% ≈ 8 min left
Subscribe Daily Brief

Get the Daily Brief
before your first meeting.

Five stories. Four minutes. Zero hot takes. Sent at 7:00 a.m. local time, every weekday.

No spam. Unsubscribe in one click.