
An AI agent supply chain attack is an attack that compromises the delivery channels of external components—such as MCP servers, Skills, and plugins—that agents load at runtime, injecting arbitrary code or malicious instructions into enterprise AI execution environments.
While traditional supply chain attacks targeted "libraries" and "container images," the attack surface in the AI agent era has expanded to "MCP/Skills dynamically loaded at runtime." The design intent acknowledged by Anthropic itself, combined with the reality of massive exposure of public MCP servers, is accelerating this problem.
This guide is intended for IT, SRE, and security personnel, and explains a 3-step process for designing a 3-layer defense consisting of: (1) allowlisting trusted sources, (2) least privilege and sandboxing, and (3) input/output guards and audit logging. By the end, readers will be equipped to immediately decide "what to restrict first and what to monitor" in their own organization's agent operating environment.
While traditional supply chain attacks targeted "libraries and containers," the attack surface in the AI agent era has expanded to MCP servers, Skills, and plugins that are dynamically loaded at runtime. Because these components receive external instructions and execute commands, perform file operations, and make API calls on enterprise PCs, the impact propagates immediately to corporate business systems.
This section organizes the structural reasons behind the expanded attack surface and documents real-world cases that have come to light.
MCP (Model Context Protocol) is a common protocol that enables agents to invoke external tools, data sources, and code. The fundamentals are covered in Introduction to AI Agent Protocols (MCP & A2A). Skills extend this by providing a mechanism for distributing reusable workflows as "skills."
The execution model consists of three layers.
| Layer | Role | Where Risk Resides |
|---|---|---|
| Agent core | Reasoning and decision-making | Prompt injection |
| MCP client | Handles protocol communication | Communication tampering / authentication bypass |
| MCP server / Skill | Executes actual commands | Arbitrary code execution / data exfiltration |
In particular, the lower-layer MCP server is designed to "execute OS commands in response to requests sent from the client." This execution capability is precisely what makes it an entry point for supply chain attacks.
Multiple public sources have reported incidents related to AI agent supply chains. Three representative cases are listed below.
Regarding the state of defensive readiness, Cisco's "State of AI Security 2026" reported that only approximately 29% of organizations are prepared to deploy agentic AI in production. The attack surface continues to expand while defensive measures have yet to catch up.
The first line of defense is to explicitly define which MCP servers and Skills are permitted to execute via an allowlist. Allowing everything by default is effectively no defense at all, and given that design-level vulnerabilities have come to light, there is no starting point other than restricting trust in delivery channels on the enterprise side.
The delivery channels to be assessed fall into three categories: (1) MCP servers on public networks, (2) Skills distributed through marketplaces, and (3) internally developed MCP/Skills. The nature of the risk differs for each.
When using public MCP servers, verify at minimum the following:
Public MCP servers accessible without authentication by anyone are not suitable for business use. It is practical to start by limiting usage to only "MCP servers confined within the internal network" or "provider-direct MCP with authentication," and to evaluate and incorporate externally public MCPs incrementally.
Skills and MCP packages should be treated under the assumption that they "can be tampered with, just like container images or npm packages."
The ideal approach is to maintain an "internal mirror that disallows automatic updates from the marketplace and distributes only versions that have passed internal review." An internal mirror may appear to be over-investment, but given observed instances of malicious skills actually circulating in the wild, it is a justified cost for pulling the supply chain boundary back to the enterprise side.
By minimizing the permissions of commands and APIs invoked by MCP / Skills, and sandboxing agent execution at both the OS and network levels, damage can be contained even if a malicious tool is introduced. The role of Step 2 is to physically limit the blast radius, on the premise that "by design" vulnerabilities permit certain behaviors.
Privilege separation should be designed at both the OS layer and the communication layer.
The basic principle is to isolate the agent itself, the MCP client, and the MCP server into separate execution contexts.
| Isolation Layer | Recommended Implementation | Scenarios Prevented |
|---|---|---|
| User | Run under a dedicated OS account | Access to individual developer files |
| Container | Separate container per server | Lateral movement between containers |
| Network | Allow only necessary endpoints | Arbitrary external API calls |
| Filesystem | Read-only mount + write access limited to working directory only | Destruction or exfiltration of business files |
A setup described as "just running it in Docker" is often insufficiently isolated in practice. Configurations such as running as root inside the container, mounting the host's Docker socket, or sharing /var/run cannot be called a sandbox. The safe approach is to restrict the set of commands a Skill can execute to only allowlisted executables, and to block all other system calls.
It has been noted that nearly 40% of MCP servers may carry SSRF vulnerabilities (based on analysis by BlueRock Security). This refers to the phenomenon where MCP / Skills can "issue arbitrary HTTP requests to internal networks or cloud metadata endpoints."
Defense centers on allowlisting egress (outbound) traffic.
169.254.169.254 (also as an additional defense even when IMDSv2 is enforced)10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, etc.) except for operationally necessary destinationsAn MCP / Skill execution environment without egress controls is adjacent to the risk of having AWS IAM role temporary credentials stolen. When operating in the cloud, egress filtering should not be treated as "nice to have" — the absence of it should be regarded as a risk that is already materialized.
By logging the input prompts and output actions of MCP / Skills and detecting anomalous patterns, you can achieve early attack detection while simultaneously satisfying compliance requirements such as PDPA. If Steps 1 and 2 represent "narrowing the entry points" as a defense, Step 3 is the layer for "detecting intrusions after they occur and fulfilling accountability."
Input/output guards are easier to organize when viewed as an extension of the patterns discussed in Prompt Injection Defense and AI Guardrails Implementation.
There are three input pathways for MCP / Skills: (1) direct prompts from users, (2) strings ingested from RAG, databases, or files, and (3) responses from other MCP servers or other agents.
Particular attention should be paid to (2) and (3), which involve "indirect prompt injection," where attack instructions are injected via data without the user's awareness.
| Inspection Item | Implementation Pattern |
|---|---|
| Removal of control characters and zero-width characters | Normalize at ingestion time |
| Detection of known jailbreak patterns | Prompt-based filter + LLM-as-a-Judge |
| Tool call confirmation | High-risk operations (deletion, fund transfers, external transmission) require human approval via HITL |
| Metadata contamination | Separate source and timestamp into distinct fields |
Since "scrutinizing every request with an LLM" is often impractical, a policy of applying heavy inspection only to tool calls involving writes, deletions, or external communication tends to strike the best balance between cost and risk.
The basic principle for MCP / Skill invocations is to retain structured logs capturing "when, who, which agent, which tool, with which arguments, and what was returned." The following outlines the minimum items to include in logs.
Common signals for anomaly detection include: (a) a high volume of tool calls in a short period, (b) egress to domains not normally accessed, (c) large-scale file reads, and (d) external transmission of responses containing personal data. Starting with simple rules that can be integrated into existing SIEM / SOC tools allows for early visibility while keeping initial investment low.
Audit logs serve not only for attack detection, but also play a role in ensuring that access histories for personal data can be presented in response to PDPA requirements or audits. Combined with encryption implementations such as AES-256, this enables both confidentiality at rest and traceability.
The reality is that the assumptions of "default settings are sufficient" and "it's safe because it's on an internal network" no longer hold for MCP / Skill defenses. The most common failures stem from structural overconfidence.
Here we examine two typical examples that are commonly observed.
The most frequently encountered misconception is the idea that "since it's running on a local machine, it won't reach the outside." The core issue with the vulnerability in Anthropic's MCP reference implementation was that arbitrary commands could be executed on the user's machine even in a local execution environment. The following outlines what can specifically occur.
"Local" should not be treated as "outside the corporate perimeter," but rather as another high-privilege host with access to business systems. Internal guidelines should ensure that PCs running MCP / Skills do not have direct access to production databases; where necessary, switching to a bastion host with short-lived credentials significantly reduces the blast radius.
Another common pattern is development convenience settings leaking into production.
MCP_ALLOW_ALL=true is set in production as wellThese are classic misconfigurations, but in the age of AI agents, the blast radius is far greater. Separating MCP / Skill configurations for development, staging, and production into distinct files, and implementing a mechanism in the CI/CD pipeline to reject unauthorized MCPs for production at build time, is highly effective. Managing configurations as Infrastructure as Code and including them in code review makes issues easier to catch.
The priority order for MCP / Skill defense is: inventory current usage → allowlisting → least-privilege enforcement → auditing. Trying to implement everything at once tends to fail. This section answers the two questions most frequently asked in the field.
Q. What additional steps should companies already compliant with Thailand's PDPA take when operating AI agents?
In the context of PDPA, when AI agents handle personal data, four additional requirements apply: (1) explicit statement of processing purpose, (2) access logging, (3) restrictions on cross-border transfers, and (4) handling of data subject requests.
The following measures are practical to implement:
Combining this with the key management patterns covered in Thailand PDPA-Compliant AES-256 Encryption Implementation makes it easier to satisfy audit requirements for both data at rest and data in transit.
Q. If an existing SOC / SIEM is already in place, how should AI agent monitoring be integrated?
Rather than standing up a new monitoring infrastructure, the practical approach is to ingest "MCP / Skill calls" as a new data source into the existing SIEM. The following four steps are recommended as a starting point:
tool_call events as wellexec, http_post, file deletion, etc.) to a separate alert class with higher priorityRather than creating a dedicated AI monitoring team, the goal should be to enable the existing SOC to treat AI agents as one new workload among others — an approach that is sustainable both in terms of staffing and operations. Adversarial exercises are covered in the AI Red Teaming Practical Guide, and should be incorporated alongside defensive exercises.
AI agent supply chain attacks have introduced an attack surface that traditional security measures were never designed to address — brought about by the emergence of MCP and Skill as new distribution channels, and by design choices that Anthropic itself acknowledges as intentional.
Defense should be architected in three layers, not around a single tool:
Putting everything in place at once is not realistic. Starting with an inventory of current MCP / Skill usage and first narrowing down the use of public MCPs is the highest-value first move.
For related reading, Introduction to AI Agent Protocols (MCP & A2A), AI Guardrails Implementation, AI Red Teaming, and Claude Mythos and Project Glasswing together provide a three-dimensional view of AI agent defense from both the detection and adversarial perspectives.

Yusuke Ishihara
Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).