TechReaderDaily.com
TechReaderDaily
Live
Compute Economics · Pricing

GPU Pricing Is Now a Derivatives Market as ICE, CME Race

With H100 rental rates up 38% in six months and cloud premiums hitting 3x over bare-metal, Wall Street is launching compute futures to turn GPU hours into a tradable asset class.

In this article
  1. The margin waterfall: who captures what
  2. What to watch for

On May 19, Intercontinental Exchange, the parent company of the New York Stock Exchange, announced it would launch cash-settled GPU compute futures in partnership with a firm called Ornn, whose Compute Price Index will serve as the settlement benchmark. The news came less than a week after CME Group revealed its own compute futures product, built on an index from startup Silicon Data. Two of the world's largest derivatives exchanges, separated by five trading days, had just declared GPU hours a tradable commodity.

The timing was not subtle. Nvidia's H100, still the workhorse of frontier training runs nearly two years after launch, saw one-year reserved contract pricing hit $2.35 per GPU-hour in March 2026, according to research firm SemiAnalysis, up from $1.70 in October 2025, a move of just over 38% in less than six months. That number, reported by Seeking Alpha, came from contract pricing data, meaning committed reserved instances, not the volatile spot market where prices swing wider and faster.

The GPU spot market has spent the past 18 months behaving like a commodity that refuses to admit it is one. Brokers quote hourly rates for H100 SXM5 80GB instances that range from just under $2.00 to north of $4.50, depending on geography, term length, and whether the silicon sits in a Tier III datacenter or a cryptomine retrofit. Reserved capacity, when you can get it, locks at roughly $2.35 to $2.80 per GPU-hour on annual commitments. Spot, by contrast, has touched $5.00 per GPU-hour during allocation crunches, then slumped to $2.20 when a large buyer released unused reservations back to the market. The spread between spot and reserved, once a sleepy 15 to 20 percent, now routinely exceeds 60 percent.

Carmen Li, the former Bloomberg data executive who now runs Silicon Data, has been building the index infrastructure that both futures markets will depend on. In an interview with Business Insider in April, Li described the price dynamics in plain terms. GPU prices are "going nuts," she said, citing Silicon Data's index data showing significant rental price increases across Nvidia's lineup over the preceding months. What makes Li's index different from the broker quotes that already circulate on Discord and Telegram is methodology: Silicon Data collects pricing from cloud providers, neoclouds, colocation operators, and bare-metal brokers, then publishes forward curves, the same shape of data that natural gas traders have used for decades.

Cloud giants charge big premiums for GPU rentals., Silicon Data CEO Carmen Li, as quoted by Business Insider, April 2026

The premium Li flagged is measurable. A May 2026 benchmark survey of cloud GPU pricing from multiple providers found that the three major hyperscalers, Amazon Web Services, Google Cloud, and Microsoft Azure, charge anywhere from $3.50 to $5.60 per GPU-hour for on-demand H100 instances. The same chips available through neoclouds like CoreWeave, Lambda Labs, or Nebius price between $2.10 and $2.90 on reserved terms, and sometimes below $2.00 on spot when utilization dips. The premium for the hyperscaler badge, integrated billing, and guaranteed availability runs roughly 60 to 100 percent.

This is the spread that futures contracts are meant to arbitrage. ICE's product, developed with Ornn, will be US dollar-denominated and cash-settled, meaning no physical delivery of GPU hours. CME's product, reported Bloomberg Law, uses Silicon Data's benchmarks to create a market where a frontier lab can lock in its inference costs for the next quarter, or a neocloud can hedge the value of its uncommitted H200 fleet. Both exchanges are targeting enterprise CFOs who currently book GPU spend as a variable cost line with zero price certainty beyond the current billing cycle.

The practical question, and the one that will determine whether these contracts attract open interest or sit dormant like frozen orange juice futures during a synthetic substitute glut, is what the underlying actually is. A natural gas futures contract settles against a physical delivery point, Henry Hub, with standardized thermal content. A GPU-hour is not a GPU-hour. An H100 running at batch size 1 for a latency-sensitive inference workload produces different economics than an H100 grinding through batch size 32 matrix multiplies for a training run. Memory bandwidth saturation, power cap settings, cooling overhead, and whether the instance sits behind a InfiniBand or Ethernet fabric all change what the buyer receives.

An IEEE Spectrum investigation published via AOL in April documented this variability in granular terms. Two H100s from the same cloud provider, same SKU, same nominal spec sheet, delivered measurably different throughput on identical inference workloads. The gap was not trivial. One chip consistently ran 8 to 12 percent faster than the other, a difference that compounds into thousands of dollars of wasted compute over a one-year reservation. The investigation called it a "silicon lottery," and it highlights the indexing problem that Ornn and Silicon Data both have to solve before their benchmarks can support a liquid derivatives market.

The TechSpot GPU pricing tracker, published in late April, reached a parallel conclusion about the consumer and prosumer segments of the market. Retail GPU pricing had stopped getting actively worse, but remained broken by any historical measure. Cards that would have cleared at $1,200 in a pre-AI market sat at $2,400 with four-week lead times. The report noted that while B200 allocation had begun to relieve some pressure on H100 availability, the relief was not translating into lower prices for the older silicon, contrary to the normal depreciation curve for server hardware. H100s were holding value because they work and because they are available, two qualities that still do not reliably coexist in any single GPU SKU.

The ETF industry smelled the opportunity before the exchanges did. The OK Computer Power ETF filed its third application for compute futures exposure in early May, Crypto Briefing reported, joining Roundhill and ProShares in a three-way race to launch a product that tracks futures contracts that did not yet exist. The filings were provisional, placeholder S-1 language that reserved tickers and described an investment strategy built on an index of GPU compute futures. The circularity was almost elegant: ETFs filing to track futures that had not launched, based on indexes that were still being built, measuring a spot market where prices changed faster than the index rebalancing schedule could accommodate.

Behind the financial engineering sits a physical allocation problem that no derivatives contract can solve. Nvidia's H100 delivery lead times from major OEMs have stabilized at 8 to 12 weeks, down from the 20-plus weeks of mid-2024's scramble, but the stabilization came alongside a shift in who gets priority. Nvidia disclosed in its most recent quarterly filing that it held an 11 percent stake in neocloud operator CoreWeave, up from roughly 6.3 percent the prior year. The stake, combined with the $2 billion capital infusion announced in January 2026, effectively means Nvidia allocates silicon to its own ecosystem first. Everyone else queues behind.

The margin waterfall: who captures what

Trace a single H100 GPU-hour from silicon to inference token, and the margin stack reveals the structural tension that futures markets are being built to resolve. Nvidia sells the SXM5 module to a server OEM for roughly $25,000 to $28,000, depending on volume. The OEM integrates it into an HGX baseboard, adding networking, power delivery, and thermal management, and sells the assembled node to a cloud operator for approximately $250,000 per 8-GPU node, or roughly $31,000 per GPU after integration, a markup of 12 to 24 percent over bare chip cost. The operator then depreciates that hardware over four to six years and layers on datacenter overhead, power, cooling, networking, and staffing, arriving at a fully loaded cost per GPU-hour of roughly $1.10 to $1.50, depending on utilization assumptions and power purchase agreements.

The operator then rents that GPU-hour at $2.35 reserved or $3.50-plus on-demand, capturing a gross margin of 40 to 60 percent over fully loaded cost at reasonable utilization rates. The model provider renting that capacity, an Anthropic, a Cohere, or a mid-tier lab running post-training on Llama-derived architectures, then converts those GPU-hours into tokens at a rate that depends entirely on model architecture, batch size, and sequence length. At batch size 32 with 4,096-token input sequences, an H100 running Llama-3-70B-class inference can produce roughly 2,500 to 3,500 output tokens per second. At $2.35 per GPU-hour, that yields an inference cost of roughly $0.19 to $0.26 per million output tokens before any model-provider margin.

The model provider then sells those tokens to an application developer at $0.50 to $2.00 per million tokens for API access, a markup that ranges from 2x to 10x over raw inference cost depending on the model tier and the provider's pricing strategy. The app developer layers on product margin. The end customer pays a subscription. Five layers of markup separate the chip from the chatbot response. A futures contract on the second layer, the GPU-hour rental rate, hedges only one slice of the stack, but it is the slice where pricing has been most opaque and most volatile.

What to watch for

Three checkpoints will determine whether compute futures become a genuine risk-management tool or a financial curiosity. First, the index construction details that Ornn and Silicon Data have promised but not yet published. A compute index that cannot account for the silicon lottery, power-cap variance, and interconnect topology is an index that will drift from physical delivery economics within weeks of launch. Second, the open interest in the first contract month. Commodities traders call it the "first roll" test: if the initial cohort of hedgers cannot roll their positions without blowing out the bid-ask spread, the contract dies. Third, and most consequential, the behavior of the hyperscalers. AWS, Google Cloud, and Azure currently capture the widest margins in the GPU rental market by bundling compute with managed services, data egress, and enterprise support contracts. A transparent futures price that exposes the spread between their on-demand rates and the broker market could force those margins down, which is exactly what futures markets are supposed to do and exactly why incumbents resist them.

The B200 is now shipping in volume, and IndiaAI's tender data from April showed a 10 percent price decline in the latest round of bidding for the new silicon. The supply response is real. But H100 pricing stubbornly refuses to follow the old depreciation curve, and H200 pricing, roughly $2.80 to $3.20 per GPU-hour reserved, shows that each new node on Nvidia's roadmap resets the pricing floor higher, not lower. The futures contracts that ICE and CME are racing to launch will, if they work, tell the market what a GPU-hour three months forward should cost. Right now, nobody can tell you that number with a straight face. That is the entire point.

Read next

Progress 0% ≈ 9 min left
Subscribe Daily Brief

Get the Daily Brief
before your first meeting.

Five stories. Four minutes. Zero hot takes. Sent at 7:00 a.m. local time, every weekday.

No spam. Unsubscribe in one click.