TechReaderDaily.com
TechReaderDaily
Live
Cloud Platforms · Neoclouds

Neocloud Inference Land Grab Accelerates, Margins in Doubt

CoreWeave's $99 billion backlog cements neoclouds as essential AI infrastructure, but rising component costs, customer concentration, and Wall Street margin scrutiny threaten their profitability.

In this article
  1. The Contender
  2. The TPU Question

On May 7, 2026, CoreWeave reported first-quarter revenue of $2.08 billion, a 112 percent increase from the same period a year earlier, and disclosed a contracted backlog that had swollen to $99 billion. The number was staggering. It also came with a footnote the market did not like: the company raised the lower end of its annual capital expenditure forecast, citing rising component costs, Reuters reported. Shares fell 11 percent in after-hours trading, a reminder that even the fastest-growing neocloud cannot outrun the physics of hardware procurement.

The earnings report arrived less than a month after the busiest 48 hours in CoreWeave's history. On April 9, the company announced a $21 billion, five-year deal with Meta, the largest contract it had ever signed, running from 2027 to 2032 as Morningstar detailed. The following day, it disclosed a separate multiyear agreement with Anthropic, Forbes reported. Taken together with a $6 billion commitment from trading firm Jane Street, CoreWeave had in a single quarter added more than $27 billion in new customer commitments. The neocloud, a category that barely existed three years ago, was suddenly indispensable.

What these customers are buying has shifted. When CoreWeave, Lambda, and Crusoe first emerged, their pitch was straightforward: rent Nvidia GPUs by the hour without the lock-in or complexity of a hyperscaler contract. The use case was overwhelmingly training. Large language models required months of compute on thousands of GPUs, and the hyperscalers, AWS, Azure, and Google Cloud, could not provision capacity fast enough. The neoclouds filled the gap. That gap is now closing, and in its place a different market is opening. Inference, the process of running a trained model to generate output, has become the primary driver of GPU demand, and it demands infrastructure built for low latency, geographic distribution, and pricing models that align with sustained usage rather than bursty training runs.

CoreWeave's positioning reflects this shift. In its Q1 earnings materials, the company cited inference workloads as a growing share of revenue. On May 12, independent benchmarks showed CoreWeave achieving top speed and price performance on Moonshot AI's Kimi K2.6 inference benchmark, Yahoo Finance UK reported. Bank of America raised its price target on CoreWeave stock the same day, explicitly citing accelerating inference demand as the catalyst. The neocloud is no longer merely a GPU rental shop for researchers who could not get capacity elsewhere. It is becoming the production backbone for AI-native companies that need to serve models at scale.

Yet the Q1 numbers revealed the tension at the heart of the neocloud business model. Revenue more than doubled, but the net loss widened to $1.40 per share, deeper than the $0.89 loss analysts had forecast. The company guided for second-quarter revenue between $2.45 billion and $2.6 billion, below the $2.69 billion consensus estimate. The miss was not dramatic, but it was the first time since going public that CoreWeave's forward guidance had undershot expectations. The stock, which had rebounded 103 percent from its March lows, gave back some of those gains.

The capex revision told a deeper story. CoreWeave raised the lower end of its annual capital expenditure forecast, citing a rise in the prices of components, Reuters reported. For a company whose entire value proposition rests on being able to acquire and deploy GPU clusters faster and cheaper than the hyperscalers, rising hardware costs compress the very margin that the business is built on. Nvidia's next-generation Vera Rubin platform is expected to command premium pricing when it begins shipping in volume later this year. Every neocloud will have to pay it.

The Contender

While CoreWeave absorbed the market's scrutiny, Nebius Group was having a very different earnings season. The Amsterdam-based company, which emerged from the restructuring of Russia's Yandex, reported a 684 percent year-over-year revenue increase in its first quarter, to $399 million. The stock jumped 18 percent in a single session, as Barron's reported on April 28. Nebius also turned EBITDA positive for the first time, a milestone that distinguished it from CoreWeave, which remained deeply unprofitable on that basis. The contrast between the two neoclouds was becoming the defining story of the category: CoreWeave had scale and backlog; Nebius had momentum and a cleaner path to profitability.

Nebius's own Meta deal, a $27 billion commitment reported in March, gave it a footing comparable to CoreWeave's in terms of anchor-tenant credibility. The company had also secured 1.2 gigawatts of power for a new data center campus, addressing one of the most binding constraints in AI infrastructure. Capacity expansion was the signal the market was watching most closely. Nebius raised its capacity guidance by 25 percent and reported a pipeline 3.5 times larger than its current deployments. In a market where available power and zoned land are scarcer than GPUs, this was the cheapest signal that the strategy was working.

The TPU Question

On May 6, The Information reported that Nebius, Lambda, and CoreWeave had all declined to adopt Google's tensor processing units, or TPUs, despite what the outlet described as an aggressive push by Google to expand the reach of its custom AI chips. The refusal was strategically significant. Google has positioned its TPUs, particularly the forthcoming Ironwood generation, as a cost-effective alternative to Nvidia GPUs for inference workloads. By declining, the three largest neoclouds were betting that their customers would pay a premium for the Nvidia ecosystem's software maturity and tooling compatibility. It was also a signal that the neoclouds, despite competing with one another, shared a collective interest in keeping the hardware layer standardized around Nvidia's CUDA platform.

The TPU episode highlighted a structural advantage that neoclouds hold over hyperscalers in the inference market. AWS, Azure, and Google Cloud each have their own custom silicon programs, Trainium, Maia, and TPU respectively, which they steer customers toward to improve their own margins. The neoclouds have no such conflict. Their only incentive is to deploy whatever GPUs their customers want, at the highest utilization rates they can sustain. For AI labs that have built their entire training and inference stacks on Nvidia's software ecosystem, the neoclouds offer a simplicity that the hyperscalers, for all their scale, cannot match.

Wall Street has noticed. On April 25, CNBC reported that analysts were growing increasingly bullish on neocloud stocks as a distinct AI investment thesis. The core argument was straightforward: AI infrastructure spending was shifting from training, which is concentrated and episodic, to inference, which is distributed and recurring. Inference revenue, the analysts reasoned, would be stickier, more predictable, and ultimately higher-margin than training revenue. Neoclouds that could establish themselves as the default inference layer for major AI labs would benefit from a compounding revenue stream that grew with their customers' end-user adoption.

The CNBC report also carried a pointed warning. McKinsey, the consulting firm, had cautioned that neocloud economics remained fragile. The firms were, in McKinsey's analysis, essentially arbitraging their access to Nvidia GPUs and cheap capital against hyperscaler pricing. If GPU supply normalized faster than expected, or if the hyperscalers caught up on capacity, the neoclouds' pricing advantage could evaporate. McKinsey's warning echoed a concern that had been circulating among cloud procurement teams at large enterprises: neoclouds were stopgaps, and stopgaps do not become enduring franchises by default.

The customer concentration data lent weight to the concern. CoreWeave's $99 billion backlog was heavily weighted toward a small number of very large customers. Meta alone accounted for $21 billion of committed spend. If any one of those anchor tenants decided to bring capacity in-house, or to diversify across multiple providers, the neocloud's revenue trajectory would change sharply. Meta, it should be noted, is simultaneously building its own AI infrastructure while also signing multi-billion-dollar deals with neoclouds. The arrangement works as long as Meta's demand for compute grows faster than its internal build-out. Should that equation flip, the anchor tenant becomes a competitor.

Rising component costs, the factor that CoreWeave cited in its capex revision, compound the concentration risk. Neoclouds fund their GPU purchases through a mix of equity, debt, and asset-backed financing secured against the GPUs themselves. When hardware costs rise, the financing becomes more expensive, and the collateral depreciates on a steeper curve. Nvidia's product cycles, which now run on roughly an annual cadence, mean that a GPU cluster purchased today will be a generation behind in 12 to 18 months. The neocloud model requires continuous capital raises just to maintain competitive infrastructure, let alone grow it.

The neocloud landscape is also getting more crowded. Axe Compute, a Pittsburgh-based entrant, went public on the Nasdaq in early 2026 and in April announced a $260 million, three-year contract for a deployment of 2,304 Nvidia B300 GPUs, Business Insider reported. DigitalOcean, traditionally a developer-focused cloud, repositioned itself as an "Agentic Inference Cloud" and reported a 40 percent latency reduction and 50 percent faster training cycles for AI-native startups that migrated from hyperscalers, according to Morningstar. Crusoe, which differentiated itself by colocating GPU clusters with flare gas sites to reduce energy costs, continued to expand its inference-focused offerings. Lambda, the original neocloud for machine learning researchers, remained private but was reportedly scaling its inference capacity.

The neocloud thesis will be tested over the next two quarters. Three indicators are worth tracking. First, the share of revenue that CoreWeave and Nebius derive from inference versus training. Both companies have suggested the mix is shifting toward inference, but neither has disclosed a precise breakdown. When they do, it will be the clearest signal of whether the neoclouds are building recurring revenue streams or still riding the training wave. Second, gross margins. Rising component costs and the need to continuously refresh hardware will pressure margins at every neocloud. The ones that maintain or expand gross margin while growing will separate from the ones that grow revenue at any cost. Third, customer diversification. If the backlog at CoreWeave or Nebius broadens beyond a handful of anchor tenants, the McKinsey warning loses force. If it does not, the warning stands.

The quarter that ended on March 31, 2026, may be remembered as the moment the neoclouds stopped being GPU stopgaps and started being something more permanent. A $99 billion backlog says as much. But the market is now asking a harder question: not whether the neoclouds can book revenue, but whether they can earn it. The answer will arrive in a series of quarterly disclosures, each one a data point in an argument that has not yet been settled. Watch for the inference-versus-training revenue split in the Q2 reports. That number, when it comes, will matter more than the backlog.

Read next

Progress 0% ≈ 9 min left
Subscribe Daily Brief

Get the Daily Brief
before your first meeting.

Five stories. Four minutes. Zero hot takes. Sent at 7:00 a.m. local time, every weekday.

No spam. Unsubscribe in one click.