GPU Spot Pricing Hits .35/Hr as Compute Trades Like Corn

In March 2026, the one-year contract price for an Nvidia H100 GPU in the cloud hit $2.35 per GPU-hour, a nearly 40 percent jump from $1.70 per GPU-hour six months earlier in October 2025. The figure comes from semiconductor research firm SemiAnalysis, cited by Seeking Alpha on April 2. Those numbers describe a contractually guaranteed, reserved-instance price, not the volatile spot market. The spot price swings wider still. And against that backdrop, two of the world's largest exchange operators are now racing to build a futures market that treats GPU compute the way Chicago treats bushels of corn.

The GPU allocation problem has split into two distinct markets, and the spread between them tells the story. Reserved instances, enterprise agreements, and committed-use discounts now dominate enterprise procurement: 45 percent of AWS customers use Reserved Instances, 41 percent use the AWS Enterprise Discount Program, and 29 percent rely on Spot Instances, according to Flexera's 2026 State of the Cloud report covered by CRN on April 3. On Azure, 43 percent use Enterprise Agreements and 40 percent use Reserved Instances. Google Cloud Platform customers lean hardest on Committed Use discounts, at 48 percent adoption. These commitment-based programs delivered tangible savings, Flexera noted, adding that the rise in their use 'is a positive trend' that 'may signal that customers are cautious about long-term commitments on these cloud platforms and are prioritizing greater flexibility.'

But commitment-based discounts do not solve the allocation problem for the buyer who needs a thousand GPUs tomorrow morning for a training run that might finish in three days. That buyer goes to the spot market, or to a neocloud that has already reserved the capacity and is willing to resell it at a markup. The spot market for high-end datacenter GPUs has no published ticker, no central clearing price, and no standard contract. It is a broker-mediated, phone-call-and-email market, and the prices it produces can move 40 percent in three weeks. BeInCrypto reported on May 29 that Nvidia H200 rental prices fell by that exact margin over a three-week window, a drop that rewrites the cost basis of any inference startup or training lab caught holding uncommitted capacity.

These programs deliver tangible savings, so the rise in their use is a positive trend. This may signal that customers are cautious about long-term commitments on these cloud platforms and are prioritizing greater flexibility., Flexera, 2026 State of the Cloud report, as quoted by CRN

The price dispersion across GPU SKUs is widening, not narrowing, as the chip generation cycle accelerates. A single MSN-syndicated benchmark analysis published on May 7 found wide hourly rate ranges for Nvidia's H100, H200, and the new Blackwell B200 across cloud providers. The H100, launched in 2022, now sits at the bottom of the stack for frontier training but remains the workhorse for inference and fine-tuning, which means its price is tugged in two directions: downward as supply floods the secondary market, and upward as inference demand scales with every new model deployment. The B200, by contrast, is priced at a scarcity premium that cloud providers are reluctant to quote publicly without a nondisclosure agreement.

The consumer GPU market tells the same story from the other end of the pipeline. In January, Engadget warned that the window for buying a new graphics card at manufacturer suggested retail price had already closed, dubbing the supply crunch 'the great RAMageddon of 2026.' The RTX 5070 Ti was nearly impossible to find at anything approaching MSRP. Consumer cards share the same GDDR7 memory substrates and, in some cases, the same TSMC CoWoS packaging capacity as their datacenter siblings. When a hyperscaler places an order for a hundred thousand H200s, the consumer GPU pipeline does not get priority.

TechSpot's Q2 2026 GPU pricing survey, published April 27, concluded that the market remains broken even if it has stopped getting worse. The outlet's methodology tracks retail and secondary-market prices across multiple SKUs and regions. Its finding that prices had plateaued rather than corrected downward suggested a structural floor: AI demand had not relented, it had simply been absorbed into baseline capacity planning.

Compute Becomes a Commodity

The most significant signal that GPU allocation is moving from engineering procurement to financial instrument came in May 2026, when two exchange operators announced competing compute futures products within a week of each other. On May 12, CME Group announced a partnership with GPU market intelligence startup Silicon Data to launch cash-settled futures based on GPU rental benchmarks. Seven days later, on May 19, Intercontinental Exchange, owner of the New York Stock Exchange, disclosed its own partnership with index provider Ornn to build a competing product. Both contracts are subject to regulatory approval, but the race is on.

The financialization of compute is not an abstract development. A futures contract means a frontier lab can hedge the cost of a scheduled pre-training run six months out. It means a neocloud can offer a fixed-price GPU reservation to a customer while laying off the price risk on a derivatives exchange. And it means speculators who have never touched a PyTorch tensor can take a position on whether H200 prices will be higher or lower in December. MarketWatch reported on May 13 that CME's contracts would let investors bet on the price of computing power, framing AI compute as the latest commodity to be abstracted into a tradable index.

The index methodology is where the economics get interesting, and where the public record still runs thin. A GPU futures contract needs a reference price. Is it the one-year reserved-instance rate published by AWS? Is it a broker-polled average of neocloud quotes? Is it the spot price for an H100 SXM with 80 GB of HBM3 at a specific interconnect tier? The answer determines who can arbitrage the contract and who gets priced out. Silicon Data has said it will build its index from GPU rental benchmarks, but the precise sampling methodology, the treatment of batch size and sequence length, and the weighting across geographies and cloud tiers have not been published. ICE and Ornn have disclosed even less.

The hyperscaler capex numbers explain why the exchanges are moving now. Total AI infrastructure spending by the largest cloud providers reached an estimated $700 billion to $725 billion, according to Seeking Alpha analysis published May 4, reinforcing supply constraints across the GPU infrastructure market. CoreWeave alone raised over $20 billion in capital in the first half of 2026, including an $8.5 billion term loan and a $2 billion direct investment from Nvidia, as 24/7 Wall St. detailed on May 19. The neocloud model, pioneered by CoreWeave and now pursued by Nebius, Lambda Labs, and others, is built on a simple spread: borrow against GPU collateral to buy more GPUs, then rent them at a margin over the financing cost.

That model works brilliantly when GPU prices are rising. It becomes precarious when they fall, because the collateral value of a fleet of H100s declines along with the rental rate a customer will pay to use them. The 40 percent H200 rental price drop BeInCrypto reported in late May, if sustained, would compress the spread on every neocloud balance sheet. Futures contracts, by allowing neoclouds to hedge their inventory exposure, could make their business model more resilient to price swings. They could also, by creating a public price signal, make it harder for neoclouds to charge an opacity premium to customers who do not know the prevailing market rate.

Who Captures the Margin

The stack from silicon to inference output contains four layers, and the margin does not distribute evenly. At the bottom, Nvidia captures the chip margin: the latest data shows H100 rental rates still climbing even as the B200 ramps, which suggests Nvidia has managed to avoid the Osborne effect that prices older inventory into obsolescence before the new product ships. Nvidia's CFO confirmed in late May that H100 prices were still rising, Barchart reported, a disclosure that sent shares of neocloud operators CoreWeave, Nebius, and Iren up 4 percent or more in a single session.

Above the chip layer sits the cloud provider, hyperscaler or neocloud, that owns or finances the physical servers. The Flexera data shows that nearly half of all cloud customers are now on commitment-based pricing, which means cloud providers are trading a discount for predictable revenue. A reserved instance might save a customer 30 percent to 60 percent off on-demand rates. The cloud provider captures the rest as margin, plus any spot-market spikes above the commitment price. Above the cloud layer sits the model provider, Anthropic or OpenAI or the second-tier labs, which buys raw compute at a blended rate somewhere between spot and reserved and sells inference tokens at a markup. The per-token price to the end user reflects this entire chain.

The question the futures market forces is whether the cloud layer's margin is sustainable once compute prices are transparent. If a financial contract settles against a public GPU rental index, anyone with a Bloomberg terminal can see what a fair H100 hour costs. The cloud providers that have built their businesses on opaque bundling of compute, networking, and software may find that the compute component is the first to be commoditized. The network effect and the software tooling, not the raw FLOPs, become the defensible moat.

India offers a live experiment in price discovery. In April, prices for Nvidia's B200 GPUs dropped 10 percent in the latest round of the IndiaAI Mission tender, MSN reported on April 28, reflecting aggressive bidding by compute service providers competing for government-subsidized capacity. The tender process, unlike the opaque broker market in the United States, forces providers to submit public bids. The 10 percent decline in a single round suggests that even the newest silicon is not immune to competitive pressure when pricing is transparent.

What remains invisible in all of these markets is the inference-specific pricing tier. Training runs are large, predictable, and amenable to reservation contracts. Inference workloads are bursty, latency-sensitive, and often too small individually to justify a one-year commitment. A model serving millions of chat completions per day needs capacity that can scale up and down on short notice, which sounds exactly like a spot-market use case. And yet the spot market, as it exists today, was not designed for sub-second latency guarantees or the kind of tail-latency requirements that a production chatbot demands. The cloud providers do not publish separate inference spot prices as a standard offering, and the broker market does not differentiate by batch size or sequence length in its quotes.

The Flexera report noted that AWS Spot Instances are used by 29 percent of customers, while Azure low-priority VMs are used by just 16 percent. Both figures undercount the real volume of interruptible compute because they exclude the neoclouds entirely. A startup renting a hundred H100s from CoreWeave on a three-month contract is effectively in the spot market; the contract length is too short to qualify as reserved, and the price floats with whatever the neocloud thinks it can charge. The neoclouds have no equivalent of the Flexera survey. Their pricing is opaque by design.

A final variable that no index methodology has yet addressed is the batch-size question. A GPU-hour is not a fungible unit of compute unless you specify the workload profile. A training run at batch size 32 saturates the GPU differently from an inference workload at batch size 1. Memory bandwidth bottlenecks differ. The tokens-per-second throughput differs. A futures contract that settles against an undifferentiated GPU-hour price creates an incentive to deliver the lowest-quality GPU-hour that satisfies the contract specification, a dynamic familiar to anyone who has traded physical commodity futures where quality grades matter. The exchanges will need to define what 'one H100-hour' means with enough precision that it cannot be gamed.

The checkpoint to watch for is the first regulatory filing from either ICE or CME that contains the draft contract specification, including the index methodology, the deliverable definition, and the position limits. That filing will answer the question of whether compute futures are a real risk-management tool for AI labs or a speculative vehicle for financial traders who will never power on a single GPU. The per-token price implied by these contracts, once they begin trading, will show up on customer invoices eventually, but the lag could be measured in quarters or years. In the meantime, the H100 one-year contract price, $2.35 per GPU-hour and still climbing, is the only number that matters.

GPU Spot Pricing Hits $2.35/Hr as Compute Trades Like Corn

Compute Becomes a Commodity

Who Captures the Margin

Read next

Compute Becomes a Commodity

Who Captures the Margin

Read next

565 TWh and Rising: AI Cooling Crisis Spurs $29.2B Liquid Cooling Race

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.