Neoclouds Race to Own Inference as CoreWeave Q1 Exposes Cost of Scale

On Thursday, May 7, CoreWeave Inc. reported first-quarter revenue of $2.08 billion, more than double the same period a year ago, and raised the floor on its 2026 exit revenue run rate to $18 billion. CEO Michael Intrator told analysts on the earnings call that the company had added $40 billion in new customer commitments during the quarter, bringing its total contracted backlog to $99.4 billion. Then the stock fell 8% in after-hours trading. The reason, buried in the same call, was a quiet revision to the company's capital expenditure forecast, driven by what Intrator described as "component cost inflation" across the GPU supply chain. For a company that spent roughly $2.60 on infrastructure for every dollar of revenue it booked in the first quarter, the market read the signal immediately: the cost of being AI's landlord is rising faster than the rent.

CoreWeave is the largest and most visible of a cohort of companies known as neoclouds, a term that has shifted from venture-capital shorthand to balance-sheet reality over the past eighteen months. The category includes Lambda, Crusoe, Nebius, and a handful of others. These are not hyperscalers in the traditional sense. They do not sell storage tiers, content delivery networks, or CRM integrations. They sell access to large clusters of NVIDIA GPUs, provisioned on demand, targeted at the handful of workloads that genuinely require thousands of accelerators running in parallel. For the past two years, that workload has been overwhelmingly training. But as CNBC reported in late April, the economics of this business are fragile enough that McKinsey has issued explicit warnings to investors: neoclouds emerged as stopgaps to address the GPU shortage, and their long-term viability depends on a market that is still taking shape.

That market is inference. Training a frontier model may require tens of thousands of GPUs running for months, but it is a finite event. Inference is ongoing, it scales with usage, and it generates recurring revenue. Every time a user prompts Claude, queries a Gemini-powered search result, or runs an enterprise agent through a model endpoint, an inference cycle executes. The hyperscalers understand this. Amazon Bedrock, Google Vertex AI, and Azure AI Foundry are all inference platforms first, training platforms second. The neoclouds, by contrast, built their businesses on the training bottleneck. The question that Q1 2026 raises is whether they can pivot their capital structures, their customer relationships, and their hardware procurement strategies fast enough to capture the inference wave before the hyperscalers absorb it entirely.

CoreWeave's two largest deals of the year illustrate both the opportunity and the concentration risk. In a 48-hour window in mid-April, the company announced a $21 billion agreement with Meta and a separate multiyear deal with Anthropic, as Forbes first reported. CoreWeave now claims to serve nine of the ten largest AI labs. These are not small accounts. Meta alone represents a commitment larger than most cloud providers' annual infrastructure revenue. But the contracts are overwhelmingly training contracts, signed at a moment when the largest labs are still building ever-larger models and need raw floating-point capacity above all else. If the industry's appetite for ever-larger pretraining runs plateaus, as some researchers at Anthropic and DeepMind have publicly suggested it might, the neoclouds will need inference workloads to keep those GPU clusters utilised. A contracted training cluster that finishes its run and goes idle is a depreciation clock ticking against no revenue.

The hardware procurement strategy that sustains these companies adds another layer of pressure. Neoclouds buy NVIDIA GPUs in bulk, often using debt financing secured against the future revenue those GPUs are expected to generate. When component costs rise, as they did in Q1, the economics of each financed cluster degrade before it has served a single workload. CoreWeave's CFO, speaking on the Q1 call, noted that higher prices for networking components, power distribution equipment, and HBM memory were pushing total cluster acquisition costs above the assumptions used in the company's underwriting models. This is not a temporary supply-chain hiccup. NVIDIA's own margins reflect the premium its ecosystem commands, and as the company transitions from H200 to B200 and then to Vera Rubin architectures, each generational leap requires neoclouds to recapitalise their fleets on a cycle that may be as short as eighteen months. A GPU cluster ordered in mid-2025 can be half-depreciated and already behind the performance curve by the time it is fully deployed and tenanted.

Lambda, the San Francisco-based neocloud that started as a GPU workstation vendor, has taken a different route. Rather than chasing the largest foundation-model training contracts, Lambda has focused on the mid-market: research labs, university groups, and enterprise teams that need clusters of dozens to hundreds of GPUs rather than tens of thousands. This strategy positions Lambda closer to inference and fine-tuning workloads by default, because its customers are not running six-month pretraining jobs. They are serving models internally, experimenting with agent architectures, and fine-tuning open-weight models like Llama and Mistral for domain-specific tasks. Lambda's unit economics are less spectacular on a per-deal basis, but its revenue mix is more diversified. The company does not disclose financials publicly, but a FinOps lead at a large enterprise customer who spoke off the record said Lambda's inference pricing on H100 instances was running roughly 40% below comparable AWS P5 instance pricing for sustained workloads, a gap that is difficult for hyperscalers to close without undermining their own margin structures.

Crusoe, the third major player, has differentiated on energy rather than on chip acquisition. Originally built around capturing stranded natural gas to power modular data centers, Crusoe has evolved into a vertically integrated operator that designs and builds its own facilities on timelines the hyperscalers cannot match. Its pitch to AI labs is not just cheaper compute but faster deployment. A Crusoe facility can go from greenfield to operational in under twelve months, compared to the two-to-three-year timelines typical of major cloud regions. This speed advantage matters enormously in an inference market where latency to end users is the binding constraint. A model served from a Crusoe point-of-presence in a secondary market can deliver lower round-trip time to regional users than the same model served from a hyperscaler's nearest region two states away. Infrastructure proximity is becoming an inference moat, and Crusoe has quietly assembled a portfolio of sites in locations the hyperscalers have yet to reach.

Nobody is going to bet a production inference pipeline on a cloud provider that might not exist in three years. The hyperscalers sell safety. We have to sell performance and price, and we have to be immaculate on both., A field CTO at one of the three largest neoclouds, speaking off the record

The safety concern is not hypothetical. Neoclouds are capital-intensive businesses with narrow moats. Their primary input, NVIDIA GPUs, is available to any buyer with sufficient credit. Their primary differentiator, operational efficiency at scale, can be replicated by any competitor willing to invest in the same orchestration software and the same cluster management practices. CoreWeave's own S-1 filing, before its IPO, disclosed that its largest customer accounted for a significant share of revenue, and the subsequent concentration into Meta and Anthropic contracts has if anything deepened that dependency. A single customer deciding to bring its training workloads in-house, as Meta has historically preferred to do with its own data center builds, would leave a hole in CoreWeave's revenue that no mid-market inference customer could fill.

Then there is the TPU question. On May 5, The Information reported that Nebius, Lambda, and CoreWeave had all declined to offer Google's Tensor Processing Units as part of their cloud instances, despite an aggressive push from Google to expand the TPU ecosystem beyond its own cloud. Google has invested heavily in making TPUs a credible alternative to NVIDIA GPUs for inference, and on a price-per-token basis they are genuinely competitive. But by refusing to carry them, the neoclouds are effectively doubling down on the NVIDIA ecosystem at precisely the moment when inference economics might argue for hardware diversity. The reasoning, according to a solutions architect at one of the firms, is straightforward: their entire operational stack is built around CUDA, their customers' models are optimised for CUDA, and adding a second silicon platform would fragment their engineering resources without a clear revenue case. That may be true today. But if Google's TPU inference pricing continues to undercut GPU instances by the 30% to 50% margins some enterprise buyers have reported, the neoclouds' refusal to diversify could look less like strategic focus and more like lock-in.

The hyperscalers are not standing still. AWS, Azure, and Google Cloud have all introduced inference-optimised instance types in the past six months, with reserved-instance pricing that narrows the gap with neocloud on-demand rates. They are also bundling inference with other services that neoclouds cannot offer: model evaluation tools, safety guardrails, vector databases, and the compliance certifications that enterprise procurement departments require. For a Fortune 500 company deploying a customer-facing chatbot, the decision to use Azure AI Foundry over a CoreWeave GPU cluster is not just about cost per token. It is about having a single vendor to call when the model hallucinates in production, when the SOC 2 auditor arrives, and when the CFO wants to renegotiate the contract. The neoclouds can win on one of those dimensions. Winning on all three simultaneously against vendors with decades of enterprise relationships and trillion-dollar balance sheets is a different kind of problem.

Wall Street is beginning to price these risks, albeit unevenly. CoreWeave's stock has been volatile since its IPO, and the post-earnings sell-off suggested that investors are scrutinising the gap between headline revenue growth and the capital required to sustain it. The company is spending $2.60 on infrastructure for every dollar of revenue, a ratio that makes sense only if the lifetime value of each deployed GPU cluster substantially exceeds its acquisition cost and if the clusters can be kept utilised across multiple customer generations. Meanwhile, privately held competitors like Lambda and Crusoe have raised capital at valuations that imply the market still sees a viable independent neocloud category, separate from the hyperscalers. But the valuations are contingent on growth continuing at current trajectories, and growth in turn depends on the inference market materialising at the scale and margin structure the neoclouds have underwritten.

One indicator worth tracking is the revenue mix at the partner ecosystem level. Datadog, which provides observability for cloud infrastructure, reported in its most recent quarterly filing that spending on GPU monitoring from neocloud customers grew faster than spending from hyperscaler customers for the third consecutive quarter. Snowflake and Databricks have both noted in earnings calls that a growing share of their own GPU-intensive workloads are running on neocloud infrastructure rather than on the hyperscalers' native compute. These are secondary signals, but they are the kind of signals that accumulate into trend lines. A neocloud that can point to inference revenue from production SaaS workloads, not just from research-lab training contracts, is a neocloud with a business model that survives beyond the current training capex cycle.

The geography of inference may also tilt in unexpected directions. Training a frontier model is location-agnostic beyond power cost and cooling efficiency. A cluster in Iowa and a cluster in Norway produce the same weights. Inference, by contrast, is latency-sensitive and increasingly regulated. Models served to users in the European Union must comply with the AI Act's requirements for transparency, risk assessment, and data governance. Models served to enterprises under HIPAA or fedramp constraints must run in certified facilities. Neoclouds that have invested in compliance infrastructure and in distributed points of presence may find that inference customers care more about data locality and certification than about the marginal dollar-per-GPU-hour savings that were the original neocloud pitch. Crusoe's modular data center model, which allows it to place capacity close to end users in specific regulated markets, is an underappreciated asset in this respect.

CoreWeave's Q1 numbers tell a story that is not cleanly bullish or bearish. Revenue doubled. The backlog swelled to nearly $100 billion. The company raised its full-year exit run rate and secured commitments from two of the most important AI labs in the world. But the cost of delivering on those commitments is rising, the customer base is concentrated, and the market these commitments were written for, a market dominated by ever-larger training runs, may not be the market that the next three years actually produce. The inference-cloud market that the neoclouds need is still embryonic, and it will belong to whichever provider can combine GPU supply, operational reliability, and enterprise trust in a single package. The hyperscalers have the trust. The neoclouds have the supply and the speed. The question for the second half of 2026 is which side closes the gap first.

Watch for CoreWeave's Q2 earnings call, expected in August. Two numbers will matter more than the top-line revenue. The first is the inference share of total compute hours sold, a metric the company has begun to disclose selectively. If that share rises above 25%, it will be the strongest signal yet that the neocloud business model can evolve beyond its training origins. The second is the customer concentration ratio. If the Meta and Anthropic deals begin to crowd out smaller accounts rather than serving as a base on which a broader customer portfolio can be built, the neocloud narrative will start to look less like the rise of a new cloud category and more like a financing mechanism for a handful of very large AI labs to externalise their infrastructure risk.

Read next

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.