AI security 2026

Agentjacking: What It Is, How It Works, and How to Stop It

In June 2026, Tenet Security documented a new attack class called agentjacking, in which malicious instructions hidden inside trusted data sources (such as Sentry error reports fetched over MCP) trick AI coding agents into executing attacker-controlled commands with developer-level privileges, at an 85% success rate across Claude Code, Cursor, and Codex.

What is agentjacking? Agentjacking is a prompt-injection attack that targets AI coding agents rather than human users. An attacker embeds hidden instructions inside data an agent trusts, such as a Sentry error report, a support ticket, or a log entry. When the agent reads that data through an MCP integration, it treats the embedded instructions as legitimate tasks and executes them autonomously, using the developer's own credentials and file system access.

Why agentjacking is different from ordinary prompt injection

Classic prompt injection targets chatbots that reply to humans. Agentjacking targets agents that act on systems: they run shell commands, install packages, call APIs, and read credential files. The attacker does not compromise the developer's machine directly. They only need write access to a data source the agent trusts.

In the Sentry variant documented by Tenet Security and published by the Cloud Security Alliance on 12 June 2026, the entry point is Sentry's Data Source Name (DSN): a write-only credential Sentry embeds in frontend JavaScript so browsers can report errors. Tenet identified at least 2,388 organisations with injectable DSNs discoverable through passive reconnaissance alone.

The six-step attack flow

Understanding the mechanics makes the defence obvious.

DSN discovery. The attacker finds a Sentry DSN in a public JavaScript bundle, a GitHub repository, or an indexing service. No authentication is required; DSNs are write-only and intentionally public.
Event injection. The attacker sends an HTTP POST to Sentry's ingest endpoint, creating a fake error event. The message, stack trace, breadcrumbs, and context fields are all attacker-controlled.
Payload embedding. Inside the fake error, the attacker writes markdown that mimics a legitimate resolution note: for example, a code block instructing the agent to run npx @attacker-package to "validate the fix."
Agent query. The developer asks their AI coding agent (Claude Code, Cursor, or Codex) to investigate unresolved Sentry issues. The agent fetches issues through the Sentry MCP integration.
Autonomous execution. The agent reads the injected event, treats the embedded instruction as a legitimate task, and runs the attacker's package with the developer's full local privileges, no separate approval step required.
Credential exfiltration. The malicious package probes for environment variables, AWS credentials (~/.aws/config), npm tokens (~/.npmrc), Docker credentials (~/.docker/config.json), SSH keys, and git credential helpers, then sends results to an attacker-controlled server.

The attack bypassed EDR, WAF, IAM policies, VPN controls, and explicit system-prompt instructions to distrust external data during Tenet's testing. Every step uses authorised tools and legitimate-looking behaviour, so security tooling sees nothing anomalous.

Tenet confirmed more than 100 real-world executions across Fortune 500 firms, hosting providers, and individual developers spanning six continents. The overall success rate was 85 percent; the 15 percent failure rate came primarily from agents that prompted for confirmation before running unfamiliar npx commands.

The generalisation: Sentry is one example, not the full attack surface

Agentjacking is not a Sentry bug. It is an architectural consequence of giving agents read access to any data source that accepts untrusted writes. Issue trackers, support queues, log aggregators, and code-review platforms carry the same risk. Sentry acknowledged the disclosure on 3 June 2026 but characterised a comprehensive fix as "technically not defensible" at the ingestion layer, deploying only a narrow content filter for the specific payload string from the research period.

The core problem: AI coding agents cannot reliably distinguish descriptive data from embedded instructions. The same properties that make agents productive, broad tool access, standing credentials, and autonomy to act on what they read, are exactly the properties an attacker borrows.

At Digiton, we build and operate AI agent systems for clients across 8 countries. Agentjacking is the attack class we consider highest-priority in our 2026 AI operations framework because the blast radius scales with agent capability: the more an agent can do, the more damage a hijacked one causes.

Defense checklist: copy-paste and implement today

These controls are ordered by effort. Implement them top-to-bottom; the first three can be done in under an hour.

Treat every MCP tool output as untrusted user input. Never let agent-read data flow directly into command execution. Apply the same validation you would apply to a form submission from an anonymous user.
Require a human review step before any autonomous shell or package execution. Disable auto-run for npx, pip install, cargo, and equivalents. The 15 percent of agents that survived Tenet's testing all required explicit confirmation before running unfamiliar commands.
Audit and remove unnecessary MCP integrations. Disable Sentry MCP (and any MCP server surfacing externally-influenced data) where it is not operationally required. Every connected data source is a potential injection channel.
Rotate exposed credentials immediately. Search public repositories and JavaScript bundles for Sentry DSNs matching https://[a-f0-9]{32}@o[0-9]+\.ingest\.sentry\.io. Rotate found tokens and add this pattern to your secret-scanning ruleset.
Use least-privilege credentials for agent processes. Agents should run with scoped, short-lived tokens: not the developer's full AWS profile, not a root npm token. Scope credentials to exactly what the agent legitimately needs.
Authenticate and pin your MCP server inventory. Treat MCP servers like third-party software dependencies: maintain an approved list, authenticate servers with signed certificates or controlled API keys, and block connections to unrecognised endpoints.
Monitor agent processes for unexpected outbound connections. Agentjacking exfiltrates data by calling an attacker server. An agent process connecting to an unknown host or an unfamiliar npm CDN is a high-confidence compromise signal.
Add prompt injection testing to your red-team program. Inject synthetic malicious instructions into the data sources your agents query and verify agents surface them for human review rather than executing them, the same way you test SQL injection in a web app.
After any suspected compromise, rotate everything the agent could reach. AWS keys, git tokens, npm tokens, Docker credentials, and SSH keys. Retain agent action logs for forensic review.

For teams building on models like GPT-5.6 or Claude via the API, the same principles apply: validate tool outputs server-side, scope permissions to the minimum required, and add a human checkpoint before any irreversible action.

If you are building or operating agent systems and want an independent security review of your MCP configuration, tool allowlists, and credential boundaries, contact Digiton for an AI-agent security audit. We review the full stack and deliver a prioritised remediation plan within five business days.

For a broader view of agentic risks in the 2026 landscape, see our practice page on platform development and agent infrastructure.

Frequently asked questions

What is agentjacking?

Agentjacking is an attack class documented by Tenet Security in June 2026 in which malicious instructions are hidden inside data sources that AI coding agents trust, such as Sentry error reports fetched over MCP. The agent reads the data, treats the embedded instructions as legitimate tasks, and executes attacker-controlled commands using the developer's own credentials and file system access.

Which AI coding agents are vulnerable to agentjacking?

Tenet Security confirmed that Claude Code, Cursor, and Codex are all vulnerable. Any AI coding agent that reads externally-influenced data through an MCP integration and can execute shell commands or install packages autonomously carries the same risk. The attack is architectural, not agent-specific.

What is the attack success rate for agentjacking?

Tenet Security reported an 85 percent success rate across leading AI coding agents in controlled tests with more than 100 consenting organisations. The 15 percent that were not compromised shared a common trait: they required explicit human confirmation before running unfamiliar commands such as npx package installations.

How does the Sentry DSN get exposed in the first place?

Sentry DSNs are write-only credentials that Sentry's own documentation instructs developers to embed in client-side JavaScript bundles so browsers can report errors directly. They are public by design. Tenet identified at least 2,388 organisations with injectable DSNs discoverable through passive reconnaissance: JavaScript inspection, Censys searches, and GitHub search, no authentication required.

What data does an agentjacking payload steal?

Tenet's proof-of-concept payload probed for all environment variables, AWS credentials at ~/.aws/config and ~/.aws/credentials, npm tokens at ~/.npmrc, Docker credentials at ~/.docker/config.json, SSH keys, git credential helpers, and Kubernetes cluster tokens. Everything the developer's account can read is within reach.

Did Sentry fix the agentjacking vulnerability?

Sentry acknowledged the disclosure on 3 June 2026 but characterised a comprehensive platform-level fix as technically not defensible at the ingestion layer. Sentry deployed a content filter targeting only the specific payload string used during the research period. The underlying write-access attack surface remains. Mitigation responsibility sits with agent operators and model vendors.

Does agentjacking only work through Sentry?

No. Sentry is one example of a broader attack pattern. Any MCP server or agent data source that surfaces externally-influenced content is a potential injection channel: issue trackers, support queues, log aggregators, code-review platforms, and CI/CD dashboards. The vulnerability is architectural: agents cannot reliably distinguish descriptive data from embedded instructions.

Why do EDR, WAF, and IAM controls not stop agentjacking?

Every step in the agentjacking chain uses authorised tools and legitimate-looking behaviour. The agent is a trusted process with developer credentials, the MCP call is a normal API request, and the package execution looks like ordinary development activity. Security tooling has no signal to act on because nothing in the observable behaviour is abnormal from the defender's perspective.

What is the fastest single thing I can do to reduce agentjacking risk today?

Disable autonomous shell and package execution in your AI coding agent configuration and require explicit human confirmation before any npx, pip install, or equivalent command runs. This single control was the primary differentiator between the 85 percent that were compromised and the 15 percent that were not in Tenet's research.

How do I find exposed Sentry DSNs in my codebase?

Search your public repositories and built JavaScript bundles for the pattern https://[a-f0-9]{32}@o[0-9]+\.ingest\.sentry\.io. Add this regex to your secret-scanning tool (GitHub Advanced Security, GitGuardian, or Trufflehog) and make it a required CI check. Rotate any DSNs found immediately and generate fresh ones from the Sentry project dashboard.

What does least-privilege mean for AI agent credentials?

Agent processes should run with scoped, short-lived tokens limited to exactly the services the agent legitimately needs: a read-only repository token, not a full AWS profile; a single-registry npm token with publish scope only for the specific package, not a root token. Credentials should expire automatically and be rotated on each agent session where feasible.

How should teams add agentjacking testing to a red-team program?

Inject synthetic malicious instructions into the data sources your agents query: create a fake Sentry event, a poisoned issue tracker ticket, or a crafted log entry containing an embedded command, then verify the agent surfaces it for human review rather than executing it. Treat this exactly as you would treat SQL injection testing in a web application pentest.

Is agentjacking relevant if we run agents in a sandboxed environment?

Sandboxing reduces blast radius but does not eliminate the risk. Tenet confirmed successful executions in sandboxed agents, VPN-protected networks, and GCP and AWS containers. Credential exfiltration can still occur if the sandbox has network egress and the agent's token scopes include sensitive services outside the sandbox boundary.

Can Digiton help us audit our AI agent security posture?

Yes. Digiton delivers AI-agent security audits covering MCP configuration, tool allowlists, credential scoping, data-source trust boundaries, and runtime monitoring. We review the full stack and return a prioritised remediation plan within five business days. Contact us at digiton.ai/contact to arrange an assessment.

State of AI Operations for SMBs 2026 AI agency in Lisbon Google Preferred Sources guide

Ready to put AI to work?

Book a discovery audit and we will map the highest-ROI AI agents and automations for your business.

Book a discovery audit →