H100 Reserved GPU Contracts Hit .35/Hour as Spot Market Diverges

On April 2, SemiAnalysis published its H100 1-year reserved rental contract index: $2.35 per GPU per hour as of March 2026. That is up nearly 40 percent from $1.70 per GPU per hour in October 2025. The assumption set is specific: one-year commitment, 8-GPU node, standard interconnect, no committed throughput floor. This is the number every CFO at an AI-native company now has taped to their monitor. It means a single 8xH100 node on a year-long reserve costs roughly $165,000, before networking, before storage, before the inference framework that will actually run on it. It also means the per-token cost implied by a reserved H100 running Llama-3-70B at batch size 32 and 1,024-token sequences is now approximately $0.0000014, assuming 60 percent hardware utilization. That number matters because it is the baseline against which every model provider, every API vendor, and every neocloud sets its margin.

The spot premium problem is not accidental. It is structural. In a VentureBeat report published April 29, enterprise GPU fleets were found to average 5 percent utilization. Not 50 percent. Five. The figure comes from aggregated telemetry across dozens of enterprise deployments tracked by FinOps and platform teams. The cause is not misconfiguration or lack of orchestration tooling, though both exist. It is a procurement loop: the fear of missing out on GPU supply drove enterprises to lock in multi-year reserved contracts during the 2024-2025 shortage. Those contracts guaranteed capacity but not workload. Now no individual team will release its idle allocation because doing so means losing access to hardware that remains genuinely scarce in the reserved channel. The idle GPUs sit powered, cooled, and billed while spot buyers scramble for the remaining crumbs.

Carmen Li, who left Bloomberg to found GPU pricing intelligence startup Silicon Data, put the dynamic bluntly to Business Insider on April 6: cloud giants charge "big premiums" for GPU rentals, and even older chips are holding their value in ways that contradict every prior hardware depreciation curve. Silicon Data's indexes, which track actual transaction prices across neoclouds, hyperscalers, and broker channels, show the H100 continuing to appreciate in the reserved channel even as its silicon ages. An H100 purchased in mid-2024 for roughly $28,000 now commands a rental yield that implies a payback period of just 14 months on a 1-year contract, shorter than the payback on most H200 deployments because the H200's higher upfront cost has not been fully offset by its higher throughput in mixed-precision inference workloads.

The bifurcation between spot and reserved markets becomes sharper when you layer in the new silicon. The B200 began shipping in volume in Q1 2026. On paper, it delivers 2.5x the FP8 throughput of the H100 at roughly 1.8x the price, which should make it the obvious choice for any training or inference buyer evaluating total cost of ownership. In practice, the B200 spot market barely exists. Most B200 capacity is pre-allocated to hyperscaler reserved contracts and to a handful of frontier model trainers who signed take-or-pay agreements before the chip taped out. The liquidity simply is not there.

This is where the per-token economics get interesting for the second tier. A model provider running on H100 spot at $1.60 per GPU-hour can serve Llama-3-70B at batch size 1, 512-token sequences, for roughly $0.0005 per 1,000 output tokens. That is one-third the price of comparable API endpoints from hyperscaler-managed services, which bundle the GPU cost with a platform margin that frequently exceeds 200 percent. The second-tier provider captures the spread, provided it can keep its GPUs fed. The constraint is not compute. It is demand: inference workloads are spiky, and a cluster that runs at 85 percent utilization for two hours can run at 12 percent for the next six. The spot buyer's cost advantage exists only during the valleys. During the peaks, spot pricing converges with reserved, and the margin disappears.

On April 20, Silicon Data launched the industry's first GPU Forward Curve, an attempt to bring the kind of futures-market price discovery that exists for oil, natural gas, and DRAM to AI compute. The curve plots expected H100 and H200 rental prices across 3-month, 6-month, and 12-month horizons using a combination of broker quotes, supply-chain lead times, and power-capacity data from utility interconnection filings. The initial forward curve, as reported by the Democrat and Chronicle, implies a 15 to 22 percent premium for 12-month H100 reserved contracts relative to 3-month spot, a gap that widens as the forward date extends into Q4 2026. The market is pricing in a supply crunch timed to Blackwell ramp constraints and a wave of new model-training clusters coming online.

The forward curve matters for a reason that has nothing to do with financial engineering. It tells CFOs when to buy. If the futures market says a 12-month H100 contract costs 22 percent more than rolling 3-month spot contracts, the rational buyer waits until the last possible moment to commit, assuming spot liquidity holds. But spot liquidity is not guaranteed. This is the same dynamic that broke the enterprise procurement cycle: the spot market is cheap until it is not, and by the time it is not, the reserved channel has already repriced. The CFO who hedged by buying forward in October 2025 at $1.70 looks prescient. The one who waited until March 2026 paid $2.35. The one who waits until September 2026 will pay whatever the forward curve says, and that number is currently pointing north.

The hyperscalers are extracting their pound of flesh at every point in this cycle. AWS, Azure, and GCP do not publish spot pricing for GPU instances in a way that allows clean comparison with neocloud spot markets, because their GPU instances bundle compute, networking, and platform services into a single SKU. But reverse-engineering the GPU component from published on-demand rates, as SemiAnalysis did in its March report, shows a hyperscaler premium of 80 to 150 percent over equivalent neocloud reserved pricing for the same silicon. The premium is the price of integration: you get the GPU, the VPC, the IAM role, the managed inference endpoint, and a single invoice. For a Fortune 500 compliance team, that is worth the spread. For a startup operating on 12 months of runway, it is an extinction-level line item.

The Axe Compute deal, announced April 22 on Markets Insider, illustrates the scale at which this game is now played. Axe Compute, a neocloud that went public in late 2025, signed a $260 million, 36-month contract to deploy 2,304 Nvidia B300 GPUs for a single unnamed enterprise customer. That works out to roughly $112,850 per GPU over the contract life, or $3.58 per GPU-hour assuming 80 percent uptime, power, and cooling included. It is a reserved deal with no spot exposure, no idle risk for Axe, and a locked margin for three years. The customer gets guaranteed B300 capacity at a rate that undercuts hyperscaler on-demand by an estimated 45 percent. The per-token cost implied by this deal, at batch size 64 on a 70B-parameter model, is likely below $0.0000008, a number that rewrites the unit economics for any inference-as-a-service business.

The Axe Compute contract also reveals who captures the margin at each layer. The chip layer, Nvidia, captured its margin at the point of B300 sale to Axe, roughly $35,000 to $40,000 per GPU based on broker estimates for the B300 tray configuration. The cloud layer, Axe, captures a gross margin of approximately 35 to 40 percent on the rental stream after deducting depreciation, power, cooling, and datacenter lease costs. The model layer, the enterprise customer, captures whatever margin it can generate from the inference or training workloads it runs on those 2,304 GPUs. The app layer, if there is one, captures whatever is left. The stack is long, and the chipmaker still takes the largest single cut.

The 5 percent utilization figure from VentureBeat keeps surfacing in conversations with FinOps leads because it exposes the fundamental accounting problem with reserved GPU contracts. A reserved H100 at $2.35 per hour costs $20,586 per year whether it runs inference or sits idle. At 5 percent utilization, the effective cost per productive GPU-hour is not $2.35. It is $47.00. At that effective rate, running inference on a reserved H100 costs more per token than calling GPT-4o via API, which defeats the entire purpose of owning or reserving the hardware. The rational enterprise response, according to platform leads I spoke with at two AI-native startups, is to sublease idle reserved capacity on the spot market at whatever price it can fetch, effectively becoming a mini-neocloud. But most enterprises lack the operational capability, the legal clearance, and the pricing infrastructure to do so.

Everyone is sitting on a mountain of idle H100s and no one will admit how big the mountain is because the moment you admit it, your CFO asks why you renewed the contract., FinOps platform lead at a Series C AI startup, speaking on background

The secondary effects are already rippling through the broader GPU ecosystem. The consumer GPU market, tracked by TechSpot in its Q2 2026 pricing survey published on April 28, shows that demand has fallen from its 2025 peak but prices remain elevated across the stack. The RTX 5090 still sells above MSRP at every major retailer, not because gamers are buying it but because small AI labs and independent researchers are snapping up consumer cards for inference workloads that cannot justify datacenter GPU economics. The consumer-to-datacenter arbitrage, where a $2,000 RTX 5090 running FP4 inference competes with a $2.35-per-hour H100 on cost per token at batch size 1, is a market signal that the reserved datacenter GPU market has overshot.

A separate report from MSN on May 7 compiled benchmark and pricing data across 14 cloud GPU providers and found wide hourly rate ranges for the H100, H200, and B200 that cannot be explained by differences in networking, storage, or support. The spread between the cheapest and most expensive H100 offering was 3.2x for equivalent silicon. The report attributed the gap to brand premiums, bundling strategies, and the simple fact that many buyers do not cross-shop once they have a relationship with a single provider. In a market that Silicon Data is trying to make transparent, opacity is still the dominant pricing strategy.

The spot market is not going to fix the utilization problem by itself. It is too thin, too volatile, and too fragmented. What might fix it is the forward curve. If CFOs can see a 12-month price projection for H100 and B200 compute, they can make procurement decisions on a cost-certainty basis rather than a fear-of-missing-out basis. They can decide, in September 2026, whether rolling 3-month spot contracts at the projected rate actually beats locking in a 12-month reserve. That decision requires data, and until April 2026 the market had none. Now it has the beginnings of a yield curve. It is not liquid, it is not deep, and it covers only a handful of GPU SKUs, but it exists. Silicon Data says it will add H200, B200, and L40S curves by Q3 2026. That timeline matters because the Blackwell ramp will be the first test of whether forward pricing can discipline a market that has been running on panic and premium since late 2022.

The reserve-versus-spot decision, for any AI infrastructure buyer in May 2026, reduces to two numbers. For the H100: $2.35 per GPU-hour on a 1-year reserve, versus a spot range of $1.40 to $3.10. For the B200: no functional spot market exists, so the reserved rate is the only rate, and brokers quote it between $4.20 and $5.10 per GPU-hour depending on cluster size and term. The decision is not which is cheaper. It is whether your workload can tolerate the spot market's volatility and whether your organization can survive the procurement conversation when the spot price spikes to $3.10 and stays there for three weeks. Most enterprises answer no to both questions, which is why reserved pricing keeps climbing even as utilization stays at 5 percent.

The next checkpoint is July 2026, when Nvidia is expected to ship the first volume tranche of B200s to neoclouds and second-tier model providers, not just hyperscalers. If that supply materializes on schedule, the spot market for H100s should soften as buyers migrate to the newer silicon and release reserved H100 capacity back into the pool. If it slips, which Nvidia supply chains have done in every generation since Volta, the H100 spot market tightens further and the $3.10 ceiling becomes the new floor. Watch the SemiAnalysis index in the first week of August. The number will tell you whether the GPU market finally found its supply response or whether it is still running on fear.

H100 Reserved GPU Contracts Hit $2.35/Hour as Spot Market Diverges

Read next

Read next

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.