Neoclouds Shift from Renting GPUs to Buying the Stack

On May 1, 2026, Amsterdam-based Nebius Group agreed to acquire Eigen AI, a 20-person MIT spinout, for $643 million in cash and stock. The sum works out to roughly $32.15 million per employee. Eigen AI does not own a data center. It does not hold GPU purchase commitments. Its asset is an inference optimisation stack: software that wrings more tokens per second out of any given GPU SKU, at any batch size, without retraining the model. In the three months prior, The Next Web reported, inference optimisation had become the most contested layer in AI infrastructure.

The logic is spreadsheet-grade. A neocloud that charges per thousand output tokens can increase gross margin by three points if it raises throughput from 45 tok/s to 52 tok/s on the same H200 node. Operating at scale, across thousands of GPUs serving millions of inference requests per hour, those three points compound. The Eigen deal signals that the buying phase, the era of stockpiling GPUs and wiring megawatt-scale campuses, has a ceiling and it is coming into view. What comes next is yield.

Nebius is not alone in reading the shift. Forbes reported that CoreWeave signed two landmark agreements within 48 hours in April 2026: a $21 billion contract expansion with Meta Platforms and a multiyear deal with Anthropic. CoreWeave disclosed it now serves nine of the 10 largest AI labs. The company has moved beyond the spot-market GPU broker origins that defined its early years and into something closer to a long-duration infrastructure utility, with hyperscaler-grade counterparties and hyperscaler-scale contract values.

The numbers on the Meta side tell their own story. Meta is spending $48 billion across CoreWeave and Nebius, AOL Finance reported, while also building its own internal AI infrastructure. The dual-track strategy, rent capacity from neoclouds while building proprietary capacity in parallel, has become standard across the hyperscaler set. It answers a question that nagged at the neocloud thesis from the start: why would the largest AI builders rent from a startup when they could simply buy the GPUs themselves? The answer, in practice, is time-to-deployment and capital allocation. A neocloud that can deliver a 32,000-GPU cluster in 90 days, already wired, already cooled, already running an inference serving stack, saves a hyperscaler six to nine months of construction and commissioning.

CoreWeave's $6.8 billion contract expansion with Meta, detailed by Seeking Alpha, validates the model's durability. The Seeking Alpha analysis noted that global GPU rental prices for Nvidia hardware remain elevated, which benefits neoclouds that locked in purchase commitments 12 to 18 months ago at lower prices. The spread between what a neocloud pays Nvidia and what it charges the customer, on a per-GPU-hour basis, is the core economic engine of the sector. At current H200 spot pricing, that spread can exceed 40 percent before utilisation adjustments.

But the premium does not flow straight to the bottom line. Every neocloud carries a debt stack. CoreWeave and Nebius have each issued billions in debt to fund GPU procurement and data center construction, a financing structure that reporting has tracked across multiple quarters. The interest expense on that debt is the wedge between the headline rental spread and the actual free cash flow. When GPU prices soften, as Nvidia's Vera Rubin generation begins shipping and H200 supply catches up with demand, the spread compresses. Debt service does not.

This is where the inference economics lens becomes essential. At batch size 1, the regime that most real-time chat applications actually run under, throughput per GPU drops sharply compared to the batch size 32 numbers that appear in marketing decks. Memory bandwidth becomes the bottleneck. A server that can push 80 tok/s per user at batch size 32 may deliver 12 tok/s at batch size 1 on the same hardware. The gap between those two numbers is what Eigen AI's software targets: scheduling, KV-cache management, and kernel fusion techniques that narrow the degradation without requiring model changes.

The per-token price that shows up on a customer invoice, the $0.15 per million input tokens or $0.60 per million output tokens that model providers quote, is the distillation of every layer beneath it. The chip. The cloud. The serving software. The model architecture itself. Nebius acquiring Eigen AI is a bet that the software layer is where the margin is migrating. If a neocloud can deliver the same token at 20 percent lower cost than a rival, it can either expand its own margin or pass the saving through to the customer and win volume. In a market where the two largest players, CoreWeave and Nebius, are already splitting tens of billions in hyperscaler contracts, volume is not theoretical.

TechCrunch reported that xAI has entered the neocloud conversation on different terms. Russell Brandom's piece noted that Musk's version is more ambitious than anything the current market has seen: data centers that may be in space by 2035, in partnership with SpaceX, and chips manufactured at the company's own Terafab facility, which would remove some but not all of Nvidia's pricing power from xAI's cost structure. Brandom observed that xAI's real business may be more about building data centers than training AI models.

The xAI dimension matters for two reasons. First, it raises the question of what a neocloud actually is. If xAI is both a model builder and an infrastructure provider, selling excess compute to third parties while consuming the rest internally, the line between customer and competitor blurs. Second, the vertical integration xAI is pursuing, chip design, fabrication, data center operation, and perhaps launch costs, represents the logical endpoint of the margin-capture argument. If you control every layer, you capture every layer's margin. The capital requirement is enormous. The execution risk is even larger.

Applied Digital occupies a distinct position in this stack. The Motley Fool reported that the company builds AI data centers for neocloud operators such as CoreWeave, and described a robust revenue pipeline feeding what it called the neocloud supercycle. Applied Digital is not competing for the inference optimisation software layer or the GPU financing layer. It is building the physical plants. Its economics are closer to a real estate investment trust than a technology company, and its margin profile reflects that, narrower but more predictable than the neocloud operators it serves.

The supercycle framing has gathered momentum. The Motley Fool noted in early April 2026 that CoreWeave and Nebius had outperformed every Magnificent Seven stock year-to-date. The same piece flagged that each is expected to grow revenue massively over the next several years. Stock performance and revenue growth are not the same as margin durability, but the market is pricing the neoclouds as infrastructure utilities with technology-company growth rates. That multiple expansion works until it does not.

Nebius reported higher-than-expected first-quarter capital spending, Reuters reported via MSN, driven by investments in GPU procurement and data center expansion tied to the Meta deal. The spending is aggressive. The company is running hot on capex at precisely the moment the inference software layer is becoming the differentiator. The Eigen AI acquisition, announced days after the earnings preview, reads as a hedge: hardware is necessary but not sufficient, and the margin that matters most over the next 24 months will come from software that makes each dollar of hardware spend go further.

A Seeking Alpha analysis framed the Nebius position bluntly, pointing to warnings from CoreWeave's Q1 2026 earnings as a caution for Nebius investors. The analysis noted that Nebius had surged over 100 percent year-to-date, fueled by resilient AI infrastructure capex and the Eigen AI acquisition, but that the CoreWeave results contained signals worth watching. What those signals were, the public summary did not enumerate, but the directional concern is consistent across the sector: revenue growth from large contracts is backstopped by debt-funded GPU purchases, and the spread between those two lines is thinner than the headline numbers suggest.

The inference-as-a-service market is coalescing around three competitive axes. The first is hardware access: who has the most H200s, the earliest Vera Rubin allocations, the densest interconnects. The second is software efficiency: who can serve the most tokens per GPU-second at batch size 1, the regime that matters for real users. The third is financing: who can carry the debt load longest if GPU spot prices decline or hyperscalers renegotiate contract terms. No single player leads on all three. CoreWeave leads on contract volume and counterparty quality. Nebius, with Eigen, is making the most explicit bet on software efficiency. xAI is pursuing vertical integration that could, in theory, make the first two axes irrelevant for its own workloads.

Where the per-token margin actually lands

Ask what a token costs to serve at scale and the answer fractures across at least four regimes. At batch size 32, with continuous batching, on an 8xH200 node running a 70-billion-parameter dense model, a well-optimised serving stack can push past 45,000 output tokens per second across the node. At batch size 1, the same hardware might deliver 1,500 tok/s. Divide the all-in cost of that node, including power, cooling, networking, and debt service, by the tokens delivered, and the per-token cost at batch size 1 can be 25 times higher than at batch size 32. Chat applications run at batch size 1. Enterprise API calls often run at batch size 1. The figure matters.

The Eigen AI acquisition targets this gap specifically. The team's research, originating at MIT, focuses on scheduling algorithms and memory management techniques that reduce the degradation curve. If a neocloud can deliver 28 tok/s at batch size 1 where a competitor delivers 18 tok/s on identical hardware, the neocloud with the better software stack can price tokens 12 percent lower and maintain the same gross margin, or price at parity and bank the difference. Over a five-year GPU depreciation cycle, that difference compounds into billions.

How long until the per-token price implied by these optimisation gains actually shows up on a customer invoice? The answer depends on competitive intensity. If both CoreWeave and Nebius are bidding for the same hyperscaler contract, the software efficiency edge gets priced into the negotiation. The customer captures some of the gain; the neocloud captures the rest. In a market with only two credible large-scale bidders, the split favours the suppliers. In a market with five, which is where the sector appears to be heading, the split tilts toward the customer.

Seeking Alpha's earnings preview of Nebius described the company as a top neocloud pick with a bullish rating and 42 percent to 138 percent upside potential over two years. The range is wide because the variables are many: GPU supply, hyperscaler demand, interest rates, the pace of Nvidia's next-generation rollout, and whether inference workloads grow faster than training workloads. That last variable is the one that matters most for the per-token economy. Training is a capital expense. Inference is an operating expense. The shift from the former to the latter, already underway, changes which layer of the stack captures the recurring revenue.

NeuReality, a hardware startup, unveiled an inference operating system in March 2026 that it calls NR-NEXUS, designed to run across any GPU and emerging XPU architectures, Morningstar reported. The product's positioning, an operating system for token factories, underscores the direction of travel. Inference is being industrialised. The factory metaphor works: raw materials in, finished tokens out, with yield defined as tokens per dollar of total cost. Every layer of the stack, from the chip to the cloud to the operating system to the application, is competing to be the yield bottleneck that captures the margin.

The next checkpoint to watch is CoreWeave's second-quarter earnings, expected in August 2026, which will be the first full quarter reflecting the Meta and Anthropic contract terms at scale. If the company reports gross margin above 55 percent on inference revenue, the thesis that neoclouds can sustain pricing power holds. If gross margin comes in below 45 percent, the story shifts: the hyperscalers are capturing the gains, and the neoclouds are becoming what their critics have always said they were, capital-intensive middlemen in a market where the only lasting moat is the chip itself. Until then, $643 million for 20 people says the smart money is betting on software.

Where the per-token margin actually lands

Read next

Anthropic's SpaceX Compute Deal Caps Radical AI Partnership Resets

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.