CoreWeave's Capex Warning Exposes Inference-Cloud Tensions

On the afternoon of May 7, CoreWeave CEO Michael Intrator took questions from analysts after the company's first-quarter earnings release and, by his third answer, had already been asked twice about capital expenditure. The company had just posted revenue of $2.08 billion, up 112 percent year over year and ahead of the $1.98 billion consensus. The net loss, at $740 million, was wider than the Street expected. Adjusted loss per share came in at $1.12, compared to the 90 cents analysts had modeled. But the line that sent the stock down more than 8 percent in after-hours trading was the full-year capex revision: CoreWeave raised its spending forecast, citing rising component costs for the Nvidia GPUs that are the only reason the company exists.

It was the third consecutive quarter in which CoreWeave's revenue growth was outpaced by the growth in what it costs to deliver that revenue. The pattern is not a fluke of the quarter. In the preceding weeks, CoreWeave had signed $21 billion in contracts with Meta Platforms and a multiyear deal with Anthropic within a 48-hour window, as Janakiram MSV reported for Forbes. The company now counts nine of the top ten AI labs as customers. That demand signal is real and deep. The question the Q1 print raised is whether the neocloud economic model can convert that signal into sustainable free cash flow, or whether it is structurally wired to pass the margin to Nvidia.

CoreWeave is the largest and most closely watched of the neoclouds, a category that also includes Lambda, Crusoe, Nebius, and Nscale. These companies share a common origin story: they emerged between 2022 and 2024 as stopgaps during the acute GPU shortage, building specialized cloud infrastructure that offered AI labs something the hyperscalers could not, which was bare-metal access to large fleets of Nvidia H100 and B200 accelerators without the overhead of AWS or Azure's generalized compute platforms. Their value proposition was simple: faster provisioning, fewer abstraction layers, and lower per-GPU-hour cost for training runs. By early 2026, that proposition was being tested on two fronts simultaneously: the shift from training to inference workloads, and the hyperscalers' own relentless buildout of AI-specific capacity.

The inference-cloud market is where the long game is being played. Training a frontier model requires a concentrated burst of compute over weeks or months. Serving that model to millions of users requires a geographically distributed, always-on fleet that earns revenue per token. The addressable market for inference is potentially larger, but the workload characteristics are entirely different: lower utilization per GPU, higher sensitivity to latency, and more fragmented demand. For neoclouds that built their fleets around large-block training reservations, the pivot to inference means rearchitecting capacity that was designed for a different economic profile. This is not lost on the hyperscalers. Google Cloud, at its Next 2026 event in April, unveiled its eighth-generation TPU split into two distinct designs: the TPU 8t for training and the TPU 8i for inference, a signal that the company sees the two workloads diverging enough to warrant separate silicon.

Google's TPU push is the backdrop for one of the more revealing signals of the quarter. On May 5, The Information reported that Nebius, Lambda, and CoreWeave had all declined to adopt Google's TPUs, despite what the publication described as an escalating campaign by Google to expand the reach of its tensor processing units beyond its own cloud. The neoclouds are saying no to the only credible alternative to Nvidia's datacenter GPU monopoly. The reasons are practical and strategic. Practically, adopting TPUs would mean retooling their entire infrastructure stack around Google's software ecosystem, a bet most neoclouds cannot afford to make while they are still scaling their Nvidia fleets. Strategically, the neoclouds' core pitch to customers is access to the GPU architecture those customers are already building against.

The TPU rejection is also a bet on Nvidia's continued dominance of the inference market, which is not a foregone conclusion. Custom ASICs from Broadcom, Marvell, and in-house designs at the hyperscalers are beginning to capture inference workloads at the edge and in cost-sensitive deployments. General Compute launched an ASIC-first inference cloud in April, as MarketWatch reported, targeting autonomous AI agent workloads on purpose-built silicon. Parasail raised $32 million for a pay-per-token inference cloud that abstracts away the underlying chip architecture entirely. If inference becomes a commodity market served by a mix of GPU and non-GPU silicon, the neoclouds' Nvidia-only fleets look less like a moat and more like an expensive concentration bet.

The Economics of Being the Landlord

The CNBC report on April 25 captured the tension well. Wall Street is getting bullish on neoclouds, the headline read, but "these stocks hold more risk than other AI plays." McKinsey had warned, CNBC noted, that the economics of the neocloud model are fragile. The fragility comes from three directions. First is component cost inflation: the same Nvidia GPUs that are the neoclouds' only product are also their largest input cost, and Nvidia sets the price. Second is customer concentration: a handful of AI labs account for the majority of neocloud revenue, and those labs have the balance-sheet strength to eventually build their own infrastructure. Third is the debt load: neoclouds have financed their GPU fleets with billions in debt secured against the GPUs themselves, a structure that works until GPU residual values decline or interest costs rise.

CoreWeave's Q1 numbers illustrate the cost-revenue squeeze in detail. The revenue beat was driven by the commencement of the Meta and Anthropic contracts, which convert committed capacity into recognized revenue. But the company's cost of revenue rose faster than the top line, compressing gross margins. The capex revision, which CoreWeave attributed to component costs, suggests that even as the company achieves scale, the marginal cost of adding a GPU to the fleet is not declining. In a traditional cloud business, scale brings down unit costs through volume discounts and infrastructure efficiency. In the neocloud model, scale means buying more chips from Nvidia at whatever price Nvidia is charging this quarter. The difference is structural, and it is not clear that revenue growth alone can close the gap.

The deals with Meta and Anthropic are double-edged. On one hand, they provide multiyear revenue visibility that nearly every infrastructure company would envy. The $21 billion Meta contract alone, spread over several years, gives CoreWeave a baseline that supports its debt financing and its expansion plans. On the other hand, the deals concentrate CoreWeave's revenue in a very small number of customers who have the resources to renegotiate, to build in-house alternatives, or to play neoclouds against hyperscalers in procurement. This is the third quarter in a row that CoreWeave has disclosed large, concentrated customer commitments as the primary driver of its revenue growth, and the pattern is consistent across the neocloud category.

Nscale, the former crypto miner turned European neocloud, announced a deal on May 5 to supply 66,000 Nvidia Rubin GPUs to Microsoft's 1.2-gigawatt campus in Sines, Portugal, as The Next Web reported. The deal is structured as a long-term infrastructure lease, with Nscale providing and operating the GPU capacity inside Microsoft's facility. It is a different model from CoreWeave's landlord approach, but it shares the same DNA: a neocloud's balance sheet absorbs the GPU procurement cost and earns a margin on the operations. Whether the margin comes from owning the real estate or from operating the silicon, the exposure to Nvidia's pricing power is identical.

What the Hyperscalers Cannot Afford to Lose

FinOps leads at large AI customers confirm that the calculus is shifting. A FinOps director at a top-ten AI lab, who also requested anonymity, said their team now models neocloud GPU reservations against hyperscaler committed-use discounts on three dimensions: per-GPU-hour cost, provisioning speed, and the switching cost of their model training and inference pipelines. "Twelve months ago, the neoclouds were 30 to 40 percent cheaper on a per-hour basis and could provision in days rather than weeks," the director said. "The gap is closing on both. The hyperscalers have gotten faster, and their discounts for long-term reservations are getting aggressive. The neoclouds' advantage is eroding from both sides." For inference workloads specifically, the calculus is even less favorable to the neoclouds, because the hyperscalers' global edge networks offer lower latency to end users, a dimension that bare-metal GPU access alone cannot address.

The partner ecosystem provides a cross-confirmation signal. Datadog, Snowflake, and the observability platforms that sit between cloud infrastructure and enterprise customers are beginning to report that AI inference workloads are growing faster on the hyperscalers' platforms than on neocloud infrastructure. That does not mean neocloud inference revenue is shrinking; it means it is growing from a smaller base. The hyperscalers' ability to bundle inference with existing enterprise contracts, data residency commitments, and compliance certifications gives them a distribution advantage that the neoclouds cannot easily replicate. CoreWeave's response has been to build out its own software platform layer, including Kubernetes-native orchestration and integrated model serving, but it is competing against AWS SageMaker and Google Vertex AI, each of which has years of enterprise adoption behind it.

Lambda and Crusoe have taken different approaches to the same problem. Lambda has leaned into the developer experience, positioning its cloud as the easiest place to spin up a GPU cluster for experimentation and fine-tuning, a market segment that the hyperscalers have historically underserved. Crusoe has differentiated on energy infrastructure, co-locating its GPU clusters with stranded natural gas and renewable energy sources to offer a lower carbon footprint than grid-powered alternatives. Neither approach solves the core economic problem, but both represent bets that specialization beyond raw GPU access can create enough stickiness to survive the hyperscalers' capacity catch-up. Whether those bets pay off depends on whether the inference-cloud market fragments into specialized segments or consolidates around the platforms with the broadest distribution.

The component cost issue that CoreWeave flagged in its earnings call is not unique to CoreWeave. Every neocloud that operates Nvidia GPUs faces the same input cost curve. Nvidia's pricing for its latest generation of accelerators, including the B300 and the Vera Rubin platform expected later this year, has continued to rise generation over generation. The hyperscalers can absorb that cost increase because they spread it across a diversified revenue base that includes storage, networking, database, and SaaS revenue. The neoclouds have only GPU compute. When the cost of their sole product's sole input rises, there is no other line of business to offset it. That is the structural fragility McKinsey was describing.

Watch for CoreWeave's Q2 print in August. Three indicators will be more telling than the revenue and loss per share numbers. First, the gross margin trajectory: whether the cost of revenue is growing slower, at the same rate, or faster than the top line will reveal whether the company is gaining any operating leverage at scale. Second, the customer concentration: if the top three customers account for a larger share of revenue in Q2 than they did in Q1, the concentration risk intensifies. Third, the inference-to-training revenue mix: CoreWeave has been signaling that inference workloads are growing as a share of its revenue, but it has not broken out the figure. Any disclosure, even directional, will calibrate whether the inference-cloud pivot is materializing in revenue or remains a slide-deck aspiration. The neocloud story is not broken. But the Q1 print made clear that the easy part, which was buying GPUs and renting them out during a shortage, is over. The hard part is building a platform that earns margin when the shortage ends.

The Economics of Being the Landlord

What the Hyperscalers Cannot Afford to Lose

Read next

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.