Agent-Native Runtimes Are the New Database Wars for Enterprise AI

On May 26, 2026, Google open-sourced something called Agent Executor, a runtime specification for running, suspending, and resuming AI agents in production. The announcement landed without a keynote or a launch event. It appeared as a repository, a documentation site, and a brief post on Google's developer blog explaining that the existing container-and-pod model, designed for stateless microservices that respond to HTTP requests and exit, was never built for software that might need to pause mid-task, wait three days for a human approval, and then resume from exactly where it left off. That gap, the post argued, is the gap between a demo and a deployed system.

The release is the sharpest signal yet in a quiet re-plumbing of the application stack that has been underway since late 2025. The premise is that AI agents are not simply chatbots with better prompts. An agent that books travel across three airlines, waits for a fare drop, or negotiates a procurement contract across a multi-week approval chain is a long-running stateful process. If the runtime under that agent cannot survive a pod restart, a node failure, or a cloud region outage without losing the state of every in-flight task, the agent is not ready for production. As MSN reported earlier this month, the feature that marked the real shift in 2026 was not better reasoning but persistent memory. ChatGPT's rollout of memory that survives sessions, Zoom AI Companion 3.0's ability to carry context across meetings and Slack threads, and Microsoft Copilot's deepening integration into the Office graph all point to the same architectural demand: the runtime must hold state.

The taxonomy is still settling, but two categories are emerging. The first is the agent-native runtime, a substrate that understands agent lifecycles natively: tool calls, sub-agent spawning, human-in-the-loop pauses, checkpointing, and durable resumption. The second is the agent control plane, a management layer that handles routing, governance, observability, rate limiting, and model selection across fleets of agents. The distinction is not academic. It determines which failures the system can tolerate and which it cannot, which is the only question that matters once an agent is handling real money or real customer data.

Google's Agent Executor lands squarely in the runtime category. It defines a standard interface for agent execution that is container-aware but not container-bound, meaning an agent can be paused, serialized to persistent storage, and resumed on a different node, in a different cluster, or after a maintenance window. The project ships with a Kubernetes operator, a CLI, and a set of SDKs in Python, TypeScript, and Go. In an analysis for InfoWorld, Anirban Ghoshal noted that the runtime addresses a set of operational challenges that have bedevilled early production deployments: agents that cannot recover from a crash without replaying their entire execution history from scratch, agents that leak tool-call state across sessions, and agents whose cost profiles become unpredictable because every retry re-burns LLM tokens from the beginning.

This problem is not new. The database community has understood durable execution since at least the 1980s, when Hector Garcia-Molina published his work on sagas and long-lived transactions at Princeton. What is new is the application of those ideas to workloads where the expensive step is not a database write but a call to a frontier model that costs dollars per million tokens. Resuming an agent from a checkpoint instead of replaying from scratch is not a latency optimization. It is a cost-control mechanism, and at the scale that enterprise deployments are now approaching, it can be the difference between a viable product and a service that burns its margin on redundant inference.

Google is not alone in identifying this gap. In April, Orkes, the company built by the original architects of Netflix's Conductor workflow orchestration platform, raised $60 million in Series B funding led by AVP, as the Business Journals reported. The Orkes platform extends the durable-execution model that Netflix developed for its microservices into the agent domain, allowing developers to define agent workflows as code, with the platform handling retries, state persistence, and exactly-once execution guarantees. In a press release for the funding round, the company described its offering as a "durable workflow orchestration platform" purpose-built for deploying AI agents in production. The connection to Netflix's operational heritage is not incidental. Conductor was built to handle the failure modes of a streaming service running across multiple AWS regions, where a transient network partition could orphan a video encoding job. Agent workflows present a structurally similar challenge, except the unit of work is a chain of LLM calls rather than a media transcode.

The emerging landscape splits along a fault line that will be familiar to anyone who watched the database wars of the 2010s. On one side, the cloud hyperscalers are building vertically integrated stacks: Google's Agent Executor sits below its Gemini Enterprise Agent Platform, which Forbes reported replaced Vertex AI at Cloud Next 2026 as Google's primary AI platform offering. On the other side, independent platforms like Orkes, along with Temporal (which has been steadily adding agent-specific SDK features) and the newer open-source project Restate, are betting that enterprises want a runtime they can run anywhere, not one coupled to a specific cloud provider's model router.

The independent platforms share a common architectural assumption that is worth examining because it is not obviously correct. They assume that agent state is fundamentally like microservice state: it can be serialized into a linear event log, replayed deterministically, and checkpointed at well-defined boundaries. But an agent that has called a tool, received a multi-page JSON response, and is midway through reasoning about that response carries state that is harder to capture than a workflow step counter. The LLM's internal attention state is not serializable. The prompt context window is the closest proxy, and it grows with every tool call, which means checkpointing an agent is equivalent to snapshotting a context window that may already be tens of thousands of tokens long, at a storage cost that is not trivial at scale.

This is where the definition of an agent-native runtime starts to have real teeth. A runtime that simply wraps an LLM call in a workflow DSL is not agent-native. It is a workflow engine with an LLM connector, and under failure it will replay from the last successful tool call, re-burning all the tokens between that checkpoint and the point of failure. A genuinely agent-native runtime would, at minimum, understand the structure of an agent's execution well enough to checkpoint at natural boundaries: after a tool call returns, after a reasoning step concludes, after a sub-agent completes. The Google Agent Executor specification defines exactly these boundaries in its execution model, which is why its release matters more than a typical open-source utility launch.

The control-plane layer is consolidating at the same time, and here the competitive dynamics are different. Anthropic launched Claude Managed Agents in April, a cloud service that handles deployment, scaling, and monitoring of agents built on Claude models. SiliconANGLE reported that the service "shortens the development workflow from months to days" by abstracting away container orchestration, API key management, and model routing. DigitalOcean unveiled its own managed agents layer at Deploy 2026 in late April, part of a five-layer AI-Native Cloud that includes an inference engine and a model router, as Forbes detailed. The pitch to the mid-market is the same one DigitalOcean has always made: someone else handles the infrastructure so developers can write business logic. But for agent workloads, the infrastructure being abstracted is not just compute and storage. It is the runtime's durability guarantees, and those guarantees are hard to evaluate when they are hidden behind a managed service.

The application-layer evidence that this re-plumbing is underway is now visible in the tools that developers and knowledge workers use daily. Visual Studio Magazine reported in April that VS Code 1.116 added persistent debug logs for current and past agent sessions, a feature that acknowledges agents are long-running processes whose failures need to be diagnosed across time rather than within a single execution trace. The same release expanded agent terminal control, allowing agents to maintain state across shell sessions. Meanwhile, MSN described the May 2026 landscape as one in which "the definition of an AI assistant has fundamentally changed," pointing to Zoom AI Companion 3.0, Microsoft Copilot, and ChatGPT as products that now operate as autonomous workflow orchestrators rather than reactive chatbots.

What these products share under the hood is a requirement that their vendors have been solving in different ways. Zoom's approach, from what has been disclosed publicly, leans on a persistent session model where agent state is held in memory and checkpointed to durable storage when the agent enters a waiting state. Microsoft's Copilot infrastructure, built on top of the company's existing cloud investments, uses a combination of Durable Functions and Service Bus queues to manage long-running agent workflows. Neither company has open-sourced its runtime, which is why Google's Agent Executor is being watched closely by platform teams that want a standard they can inspect and operate themselves.

The question of governance necessarily follows. If agents are long-running stateful processes that can spawn sub-agents, call external tools, and wait for human input, then the governance surface is larger than for a stateless microservice by a factor equal to the number of external interactions, multiplied by the number of possible failure modes at each interaction. ServiceNow addressed this directly at its Knowledge 2026 conference, where, as Forbes reported, the company expanded its AI control tower with agent governance features, including the ability to set policies that limit which tools an agent can call, which models it can route to, and what happens when a human approval times out. The governance layer is effectively becoming a required component of any agent-native stack, sitting alongside the runtime and the control plane.

The cost dimension of this architecture is still being mapped. A durable agent runtime stores checkpoints, which means it incurs storage costs that a stateless microservice does not. Those checkpoints are written to block storage or object storage on every significant state transition, which means the runtime's write amplification is a function of how chatty the agent is with its tools. An agent that calls ten tools per task and checkpoints after each call generates ten writes per task, plus the writes for the final result. If the checkpoint includes the full LLM context window, and that window is 100,000 tokens, each checkpoint costs roughly 400 KB in raw text plus serialization overhead. At a million tasks per day, that is 400 GB of checkpoint data before compression, before replication, and before considering that the checkpoints must be retained until the task completes, which for some enterprise workflows can be weeks.

The operational profile ages differently at different scales. At 1 GB of state, a single Postgres instance backing a Temporal or Orkes deployment handles the workload comfortably; the query patterns are append-heavy and the read path is mostly point lookups by workflow ID. At 1 TB, the hot-working-set problem appears because agents that are actively running need their checkpoints in memory or on fast SSD, while completed or suspended agents can be paged to colder storage. At 1 PB, the system has to make a choice about which checkpoint format to use, because serializing full context windows becomes untenable and the runtime needs to support incremental checkpoints that capture only the delta since the last state transition. None of the current agent-native runtimes publicly documents an incremental checkpoint format, which suggests this is a problem being discovered rather than solved.

The durability guarantees themselves are worth examining for what they quietly ignore. The workflow engines in this space typically promise exactly-once execution semantics for the workflow steps they control: tool calls, sub-agent invocations, and state transitions. But they cannot promise exactly-once execution of the LLM inference call itself, because the model provider's API does not expose an idempotency key. If the runtime checkpoints after sending a prompt to Claude or Gemini, and the network drops before the response arrives, the runtime must decide whether to retry. A retry re-executes the inference, burning tokens and potentially producing a different output. The runtime can detect the duplicate by comparing response IDs if the model provider returns them, but the duplicate detection is post-hoc, not preventive. This is not a bug in any particular runtime. It is a seam between the runtime's consistency model and the model provider's consistency model, and it will only be closed when the major model APIs add idempotency support to their inference endpoints.

The industry's direction of travel is legible from the sequence of announcements across the first half of 2026. In March, Kubescape 4.0 added runtime security scanning for AI agent workloads on Kubernetes, as InfoQ reported, acknowledging that agents are a new category of workload with their own attack surface. In April, OpenAI's Agents SDK reached production status, and Anthropic launched Claude Managed Agents. In May, ServiceNow deepened agent governance, Google open-sourced Agent Executor, and MSN declared that agentic AI had reached "a decisive inflection point, moving from experimental trials to widespread production deployment." The progression from security scanning to SDK stabilisation to runtime standardisation to governance is the same progression that containers followed between 2014 and 2018, compressed into roughly six months.

The analogy to containers is useful but incomplete. Containers standardised the packaging of an application's dependencies. An agent-native runtime standardises the execution of a process whose dependencies include not just libraries but external services, human judgement, and models that may change their behaviour between invocations. The runtime cannot control those external dependencies, which means its durability guarantees are always partial. The best it can do is make the boundaries of its guarantees explicit, so that the developer knows which failure modes the runtime absorbs and which ones it passes through. The Google Agent Executor specification does this by defining a clear contract for checkpointing, resumption, and tool-call replay. The independent platforms do it by exposing their event logs and letting operators inspect the exact sequence of steps that led to a particular outcome. In both cases, the runtime is making a trade that is common in databases but new to application infrastructure: trading completeness of the guarantee for clarity about what is not guaranteed.

A final structural question is whether the agent-native runtime will remain a distinct layer or be absorbed into the database. There is a precedent. PostgreSQL absorbed JSON, then full-text search, then vector search with pgvector, each time turning what had been a separate infrastructure category into a feature of the database. Durably executing a long-running agent workflow is, at its core, a state management problem, and databases are the tools humanity has built for state management. If Postgres can manage agent checkpoints as efficiently as a purpose-built workflow engine, and if it can expose the resumption API that agent frameworks need, then the runtime layer collapses into the storage layer. This is not imminent. The workflow engines have a decade of operational experience with the specific failure modes of long-running processes that databases treat as an anti-pattern. But the pressure to collapse layers is constant in infrastructure, and the agent-native runtime is no more immune to it than the message queue was.

The checkpoint to watch is the next Cloud Next and re:Invent cycle. If Google ships Agent Executor as a managed service integrated with its model router, and if AWS responds with an equivalent built on Step Functions and Bedrock, then the agent-native runtime will have completed its passage from open-source experiment to cloud-provider primitive. At that point, the question for platform teams shifts from "which runtime should we adopt" to "which runtime guarantees can we verify," and the answer will depend on whether the cloud providers publish their failure models with the same detail that the open-source projects do. If they do not, the enterprise default may shift toward the independent platforms, not for cost reasons but for auditability. That would be a notable outcome in a market where the hyperscalers typically win on operational burden alone.

Read next

Neoclouds Lead Inference Race as Capital Needs Outpace Revenue

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.