Prompt Injection Attack Surface Grows as AI Companies Fail to Close It

On May 29, 2026, The Hacker News reported on a vulnerability that security researchers at Permiso had named ChatGPhish. The finding was specific: an attacker could embed malicious Markdown links into a web page, ask ChatGPT to summarize that page, and watch the model render the links as trusted content inside the chat interface. The user, seeing a summary from a service they trust, would have no visible reason to doubt the links the AI had reproduced. The disclosure was not theoretical. Permiso demonstrated a working exploit chain, and the vulnerability sat at the intersection of two capabilities that users had come to rely on: web summarization and rich text rendering.

The ChatGPhish disclosure landed in a year that had already seen a cascade of prompt injection research. In the first six months of 2026, Anthropic, OpenAI, Google, and Meta each published findings quantifying how often their models could be diverted, hijacked, or made to act against a user's interests through nothing more than carefully placed text. Decrypt reported in late May that OpenAI had acknowledged the problem may never be fully solved. The core dynamic is deceptively simple: large language models process instructions and data through the same input channel, and they cannot reliably distinguish between the two. A system prompt that says 'summarize this page' and a hidden line on that page that says 'ignore previous instructions' enter through the same door.

What made ChatGPhish distinctive was not the injection mechanism itself but the trust surface it exploited. When ChatGPT summarizes a web page, it parses the page's Markdown, including any links or images the page contains. An attacker who controls the page being summarized can structure that Markdown so that ChatGPT's rendering presents a phishing link as though it were a legitimate reference. eWeek noted that the attack turned the AI assistant itself into the delivery channel, bypassing many of the conventional signals, such as a suspicious sender address or an unfamiliar URL in an email, that users and security tools rely on to detect phishing. The model, in effect, vouches for the attacker's content.

Five days after the ChatGPhish disclosure, on June 3, 2026, OpenAI began rolling out a response. Lockdown Mode, Firstpost reported on June 8, is a ChatGPT setting that disables several internet-connected capabilities, including live web browsing and deep research, when activated. The company positioned it as a protective measure for users at elevated risk of targeted prompt injection attacks. By June 8, TechRepublic confirmed the feature had been expanded to millions of eligible users.

But Lockdown Mode is an opt-in toggle, not a systemic fix. Gizmodo quoted OpenAI's own blog post stating plainly that 'Lockdown Mode is not intended for everyone.' The feature reduces the available attack surface by removing web-connected tools from a session, but it does not prevent prompt injection within the remaining context. A user with Lockdown Mode enabled can still be targeted through the content of a document they upload, a conversation history they resume, or a third-party integration they authorize. The setting shrinks the blast radius; it does not eliminate the vulnerability class. That distinction matters when the difference between a feature and its absence is the difference between being phished and not.

On June 2, 2026, VentureBeat reported a finding that gave the prompt injection discussion a concrete, empirical anchor. Anthropic had tested its browser agent, a system that can navigate the web and take actions on a user's behalf, against a set of adversarial web pages. Before the company's safeguards were engaged, the agent was successfully hijacked 31.5 percent of the time. The figure is not a measure of real-world compromise, and Anthropic did not claim that any user had been victimized. But it established a baseline for how often a capable agent, when exposed to untrusted content without mitigations, could be made to execute instructions that contradicted its user's intent.

VentureBeat also noted that Anthropic, OpenAI, Google, and Meta had each published prompt injection disclosures in 2026, but that no two companies measured the same thing. One might test a browser agent against adversarial pages; another might evaluate a chat model's resistance to hidden instructions embedded in uploaded files. The lack of a common evaluation framework meant that a 31.5 percent hijack rate in one context and a single-digit figure in another could not be compared. As a result, enterprise buyers had no standardized way to assess the prompt injection resilience of the models they were integrating into their products. Security assessment, for this entire vulnerability class, remained ad hoc.

At Infosecurity Europe 2026, held in London in early June, OWASP researcher Ariel Fogel delivered a blunt assessment. Prompt injection, Fogel said, remains an 'unresolved problem' within the AI security landscape. The statement is notable for what it rules out. It means that no general-purpose mitigation exists, that the underlying architectural condition that enables prompt injection, the conflation of instruction and data in a single context window, has not been addressed by any major vendor, and that every deployed AI agent that processes untrusted input is operating with a known, unclosed vulnerability class. OWASP has since introduced an agentic AI security maturity framework to help organizations assess governance readiness against adoption velocity.

The agent problem compounds this. A chatbot that produces text under adversarial influence is one risk; an agent that can read email, approve expense reports, or push code to a repository is another. TechRadar reported in late May 2026 that self-running AI agents were creating what it called the biggest security crisis of the year. The crisis is not that agents are inherently insecure, but that they are being deployed into enterprise environments that lack the monitoring, access-control, and input-validation infrastructure that conventional software has accumulated over decades. An agent given a corporate credit card and an inbox is a threat actor's ideal target.

The browser extension surface has proven especially fraught. On May 8, 2026, CSOonline reported that Anthropic's Claude in Chrome extension could be abused by malicious extensions exploiting overly trusted browser communication paths. A malicious extension installed alongside Claude could trigger unauthorized AI-assisted actions by sending instructions through channels the browser treats as legitimate. The finding illustrated a recurring pattern: the more integration points an AI agent has with its environment, the more surfaces an attacker can use to inject instructions that the agent will treat as authoritative. Each new plugin, API connection, or data source is a potential injection vector.

Forbes reported on May 22, 2026, that supply chains represent a primary target in the AI era, as organizations connect their AI systems to third-party data sources, APIs, and services. Each connection is a potential injection vector. A supplier's compromised web page, summarized by a procurement agent, becomes a route into the buyer's decision-making process. The same Markdown rendering issue that Permiso demonstrated in ChatGPhish applies to any AI system that fetches, parses, and summarizes untrusted web content on behalf of a user, and in a supply-chain context, that user may be making financial or operational decisions based on what the AI reports.

From Prevention to Resilience

At the Gartner Security and Risk Management Summit in early June 2026, TechRepublic reported, the analyst firm placed resilience, identity, and AI agent governance at the center of cybersecurity strategy. The shift is a pragmatic acknowledgment that prevention alone cannot keep pace with the prompt injection threat surface. If an attacker can hijack an agent 31.5 percent of the time before safeguards engage, as Anthropic's research indicated, then the security function must also plan for what happens after a compromise succeeds. Resilience means containing the blast radius: scoping agent permissions, rotating credentials, logging every action an agent takes, and building the ability to revoke an agent's access without disrupting the business.

On June 12, 2026, TechRepublic reported that CISA had issued a warning about a vulnerability in LiteLLM, an open-source AI gateway that enterprises use to manage access to multiple large language models. The flaw highlighted a governance gap: many organizations were deploying AI gateways that routed prompts to models without applying consistent authentication, authorization, or content filtering across all paths. An attacker who could reach the gateway with a crafted prompt could, in some configurations, bypass the controls that the organization believed were in place. CISA's advisory reinforced the Gartner message: identity and access governance are not optional layers on top of AI security; they are the foundation.

The same week, at Microsoft Build 2026, the company announced that MDASH, its agentic AI vulnerability detection system, had exited preview with more than 100 specialized threat-hunting agents, ZDNET reported. The system is designed to find real, exploitable flaws in AI deployments, including prompt injection vulnerabilities. The scale of the deployment, 100 agents hunting simultaneously across different surfaces, is an implicit acknowledgment of the problem's scope. A single scanner looking for a single injection pattern is not sufficient when the threat surface includes web summarization, document parsing, browser extension APIs, third-party integrations, and supply-chain data flows.

Prompt injection remains an unresolved problem., Ariel Fogel, OWASP researcher, speaking at Infosecurity Europe 2026

The AI-prompt-injection threat surface in mid-2026 is defined by three uncomfortable facts. First, every major AI vendor has acknowledged the problem, and none has solved it. Second, the agents being deployed today can take actions in the real world, raising the stakes from misinformation to financial loss and data exfiltration. Third, the evaluation frameworks that would let buyers compare the resilience of different models do not yet exist. The most important metric to watch in the second half of 2026 is not the number of prompt injection disclosures but whether any vendor ships a model that can reliably distinguish between what it is being asked to do and what a third party has hidden in the content it is being asked to process. Until that happens, Lockdown Mode and its equivalents will remain what they are: circuit breakers, not cures.

From Prevention to Resilience

Read next

Open-Source Supply Chain Attack Sweeps npm, PyPI, Docker in 48 Hours

Get the Daily Briefbefore your first meeting.

Get the Daily Brief
before your first meeting.