Neoclouds Lead Inference Race as Capital Needs Outpace Revenue
CoreWeave, Nebius, and other GPU-native challengers are capturing inference workloads, but customer-concentrated revenue and soaring capital demands test their economics in 2026.
cnbc.com
In this article
On April 9, 2026, CoreWeave announced a $21 billion cloud services contract with Meta, running from 2027 through 2032. Forty-eight hours later, it disclosed a separate multiyear deal with Anthropic. In under two days, the New Jersey-based neocloud had booked commitments that, on an annualised basis, exceeded the cloud infrastructure revenue of most second-tier hyperscalers. CoreWeave's chief executive, Michael Intrator, told investors on the subsequent earnings call that the company now counts nine of the world's ten largest AI labs as customers. The phrase that lingered from the call transcript, however, was not about market share. It was about the kind of work those customers were sending to CoreWeave's GPU clusters, and how quickly that mix was changing.
The term neocloud entered the industry lexicon around 2023 to describe a cohort of cloud providers built from the ground up for GPU-accelerated workloads, as distinct from the general-purpose virtual machines and object storage that define AWS, Azure, and Google Cloud. Network World described them in April as "specialized cloud computing platforms that provide high-performance, GPU-centric infrastructure," noting that they had begun taking market share from traditional data center providers. The cohort includes CoreWeave, Lambda, Crusoe, and Nebius, the Amsterdam-based company that split from Yandex in 2024. By the first quarter of 2026, these companies were no longer a curiosity. They had become the infrastructure layer that frontier AI labs depended on when hyperscaler capacity ran short.
The neocloud thesis has always rested on a simple arbitrage: Nvidia's GPUs are scarce, enterprises and AI labs want them immediately, and the large public clouds cannot always allocate tens of thousands of H100s or B200s to a single tenant on short notice. Neoclouds stepped into that gap, building data centers dense with Nvidia hardware and renting it out in configurations that a model-training run actually needs. For two years, the dominant workload was training. But across the first half of 2026, a second pulse began to build, and it is the one that may define the neocloud category more durably than the first.
Inference workloads, the day-to-day operation of serving a trained model's outputs to users, have been the faster-growing share of GPU consumption inside neoclouds for at least three consecutive quarters, according to earnings commentary from both CoreWeave and Nebius. Training is spiky; a lab runs a cluster at maximum utilisation for weeks or months, then idles it while the next model is being designed. Inference is continuous, metered, and tied directly to end-user adoption. That makes it a better match for the neocloud business model, which depends on high and predictable utilisation to service the debt that finances the GPU fleet.
In the neocloud race, inference optimisation is the competitive edge., The Next Web, reporting on Nebius's acquisition of Eigen AI, May 2026
The clearest signal of this pivot came on the first of May, when The Next Web reported that Nebius had agreed to acquire Eigen AI, a 20-person MIT-spinout, for $643 million. The price, roughly $32 million per employee, was not a headcount play. Eigen's technology maximises the number of tokens a given GPU can produce per second during inference, optimising across model architectures, batch sizes, and hardware configurations in ways that shave percentage points off cost-per-token. For a neocloud competing on the thinnest of margins, those percentage points compound across billions of tokens served daily. Nebius's chief executive cited the acquisition as foundational to the company's inference-first strategy.
Nebius reported first-quarter revenue of $399 million, a seven-fold increase from the prior year, according to Barron's. The company also raised its contracted power capacity guidance by 25% and reported a pipeline 3.5 times larger than its current capacity. Earnings before interest, taxes, depreciation, and amortisation turned positive for the first time. The stock surged 18% on the day of the report and has more than doubled year-to-date. For a company that was, two years ago, a carve-out from a Russian search engine conglomerate, the trajectory is striking. But the capital required to sustain it is formidable.
CoreWeave, the largest of the neoclouds by revenue and contracted backlog, has raised more than $20 billion in capital so far in 2026, 24/7 Wall St. reported on May 19. The figure encompasses debt, equity, and structured financing tied to GPU collateral. In early May, the company reported its first-quarter results, raising the lower end of its annual capital expenditure forecast. SiliconANGLE noted that shares fell more than 8% after-hours as revenue guidance for the current quarter came in below consensus. Component costs were rising, particularly for networking and cooling infrastructure, and the market's patience for margin compression appeared to be thinning.
This is the core tension of the neocloud model as it matures. The same supply constraints that created the opening for neoclouds, Nvidia's inability to meet total global GPU demand through its own channel, also constrain the neoclouds themselves. Every new data center requires not just GPUs but high-speed interconnects, transformer-level power infrastructure, and liquid cooling systems, all of which are subject to their own supply-chain bottlenecks. The more successfully a neocloud signs large customers, the more capital it must raise to build capacity that will not generate revenue for twelve to eighteen months. It is a treadmill that accelerates with success.
Customer concentration compounds the risk. Morningstar characterised the Meta deal as "the largest deal CoreWeave has ever signed with a single customer." Add the Anthropic commitment and the $6 billion Jane Street contract reported by Reuters on April 15, and three customers likely account for the majority of CoreWeave's contracted future revenue. Concentration of this magnitude is not unprecedented in enterprise technology. But it means that any single customer's decision to build in-house capacity, or to reallocate budget toward a competing hyperscaler, would land disproportionately on the neocloud's income statement.
CNBC reported on April 25 that McKinsey had warned neocloud economics are fragile. The article noted that the stocks "hold more risk than other AI plays" and that the companies "emerged as stopgaps to address the GPU shortage." The framing is important: stopgaps are, by definition, temporary. If Nvidia's supply expands faster than expected, or if the hyperscalers' own GPU regions come online ahead of schedule, the scarcity premium on which neocloud margins depend could narrow. The neocloud counterargument, articulated across earnings calls and investor presentations, is that the market is growing fast enough to absorb all the capacity that everyone is building.
The Inference Imperative
If training was the neocloud wedge, inference is the expansion strategy. The economics are different at every level. Training workloads favour the largest, most homogeneous clusters. Inference workloads favour geographic distribution, low latency, and high throughput at the lowest possible cost per token. A neocloud that builds data centers in ten metro areas, each running inference-optimised software, can offer lower latency than a hyperscaler serving the same model from three regions. That is the structural advantage the neoclouds are racing to lock in before the incumbents replicate it.
Nebius's Eigen acquisition was the most conspicuous bet on inference-layer optimisation, but it was not the only one. In late April, a new entrant called Antimatter launched, describing itself as the "world's first vertically integrated neocloud for AI inference," according to a press release picked up by MarketWatch. The company claims to have secured over one gigawatt of power capacity through grid connection agreements and reserved sites. Vertically integrated is the operative phrase: Antimatter is not just renting out GPUs but building the full stack, from power generation agreements through to the inference-serving software layer, in an attempt to capture margin at every tier.
Lambda and Crusoe, the two other frequently cited neocloud names, have been quieter in public disclosures but have continued expanding their GPU fleets and customer rosters. Lambda, which started as a GPU workstation vendor for deep-learning researchers, now operates cloud regions offering on-demand access to Nvidia H200 and B200 clusters. Crusoe has differentiated partly on energy: it builds data centers co-located with stranded natural gas and renewable energy sources, marketing a lower-carbon compute product to AI labs with environmental commitments. Neither has disclosed the scale of customer contracts that CoreWeave and Nebius have made public, which leaves the market with an incomplete picture of the competitive field.
The hyperscalers are not standing still. AWS, Microsoft Azure, and Google Cloud all expanded their GPU-as-a-service offerings during the first quarter of 2026, according to CRN, which cited Synergy Research Group data showing the three companies continue to dominate overall cloud market share. But the hyperscalers face an inherent conflict: their most profitable workloads are CPU-based, long-running, and deeply integrated with their proprietary platform services. GPU-intensive AI workloads cannibalise data center power and cooling capacity that could otherwise be sold at higher margin. The neoclouds, unburdened by legacy platform economics, can price closer to the cost of the silicon.
What to Watch in the Second Half
Three indicators will determine whether the neocloud category consolidates its position or begins to look like a bridge technology. The first is the trajectory of CoreWeave's component costs. If networking and cooling prices continue to rise, every neocloud's marginal cost of adding capacity will climb. The second is customer diversification: whether CoreWeave can sign a fourth and fifth anchor tenant beyond Meta, Anthropic, and Jane Street, and whether Nebius can convert its 3.5x pipeline into signed contracts before the year ends. The third is the inference utilisation rate, the percentage of GPU hours devoted to serving models rather than training them. That number, disclosed selectively in earnings materials, is the cheapest signal that the neocloud business model is maturing from a capacity play into a recurring-revenue platform.
There is a broader strategic question that will not be answered in a single quarter. The neoclouds have positioned themselves as the neutral infrastructure layer for AI, agnostic to which lab's model runs on their silicon. That neutrality is valuable to customers who do not want to train on a competitor's cloud. But it also means the neoclouds do not own the customer relationship at the application layer. If a lab's model becomes commoditised, or if a customer consolidates onto its own infrastructure, the neocloud sees the revenue leave without a platform stickiness to slow it down. This is the third quarter in a row that CoreWeave and Nebius have reported triple-digit revenue growth. The harder metric to track, and the one that will matter more over time, is whether their customers keep coming back for inference long after the training run is finished.