AI Agent Supply Chain Attack Defense Guide — Protective Implementation for MCP/Skill Distribution Channels

AI Agent Supply Chain Attack Defense Guide — Protective Implementation for MCP/Skill Distribution Channels

Lead

An AI agent supply chain attack is an attack that compromises the delivery channels of external components—such as MCP servers, Skills, and plugins—that agents load at runtime, injecting arbitrary code or malicious instructions into enterprise AI execution environments.

While traditional supply chain attacks targeted "libraries" and "container images," the attack surface in the AI agent era has expanded to "MCP/Skills dynamically loaded at runtime." The design intent acknowledged by Anthropic itself, combined with the reality of massive exposure of public MCP servers, is accelerating this problem.

This guide is intended for IT, SRE, and security personnel, and explains a 3-step process for designing a 3-layer defense consisting of: (1) allowlisting trusted sources, (2) least privilege and sandboxing, and (3) input/output guards and audit logging. By the end, readers will be equipped to immediately decide "what to restrict first and what to monitor" in their own organization's agent operating environment.

While traditional supply chain attacks targeted "libraries and containers," the attack surface in the AI agent era has expanded to MCP servers, Skills, and plugins that are dynamically loaded at runtime. Because these components receive external instructions and execute commands, perform file operations, and make API calls on enterprise PCs, the impact propagates immediately to corporate business systems.

This section organizes the structural reasons behind the expanded attack surface and documents real-world cases that have come to light.

MCP / Skill Execution Model

MCP (Model Context Protocol) is a common protocol that enables agents to invoke external tools, data sources, and code. The fundamentals are covered in Introduction to AI Agent Protocols (MCP & A2A). Skills extend this by providing a mechanism for distributing reusable workflows as "skills."

The execution model consists of three layers.

LayerRoleWhere Risk Resides
Agent coreReasoning and decision-makingPrompt injection
MCP clientHandles protocol communicationCommunication tampering / authentication bypass
MCP server / SkillExecutes actual commandsArbitrary code execution / data exfiltration

In particular, the lower-layer MCP server is designed to "execute OS commands in response to requests sent from the client." This execution capability is precisely what makes it an entry point for supply chain attacks.

Incidents That Surfaced in 2026

Multiple public sources have reported incidents related to AI agent supply chains. Three representative cases are listed below.

  • "By design" RCE vulnerability in Anthropic MCP: An advisory published by OX Security on April 15, 2026 noted that the reference implementation of the Model Context Protocol was designed in a way that permits arbitrary command execution. Anthropic declined to make significant changes to the STDIO execution model, characterizing it as a "safe default by design," and indicated that sanitization is the responsibility of the user (sources: OX Security / SecurityWeek). The cumulative impact is reported to exceed 150 million downloads.
  • Mass exposure of public MCP servers: BlueRock Security analyzed over 7,000 MCP servers and reported that approximately 36.7% showed candidate SSRF (Server-Side Request Forgery) vulnerabilities. Numerous MCP servers exposed to public networks without authentication have also been reported.
  • Delivery of malicious skills via marketplaces: Multiple security researchers have publicly reported observations of malicious skill distribution through AI agent "skill markets."

Regarding the state of defensive readiness, Cisco's "State of AI Security 2026" reported that only approximately 29% of organizations are prepared to deploy agentic AI in production. The attack surface continues to expand while defensive measures have yet to catch up.

Step 1: Trusted Source Allowlists and Skill Delivery Path Verification

The first line of defense is to explicitly define which MCP servers and Skills are permitted to execute via an allowlist. Allowing everything by default is effectively no defense at all, and given that design-level vulnerabilities have come to light, there is no starting point other than restricting trust in delivery channels on the enterprise side.

The delivery channels to be assessed fall into three categories: (1) MCP servers on public networks, (2) Skills distributed through marketplaces, and (3) internally developed MCP/Skills. The nature of the risk differs for each.

Risk Assessment of Public MCP Servers

When using public MCP servers, verify at minimum the following:

  • Authentication methods (OAuth, API keys, mTLS) are documented, and no unauthenticated endpoints exist
  • The provider's organizational information, contact details, and vulnerability reporting channels are clearly stated
  • Communications are encrypted with TLS, with additional protections such as certificate pinning in place
  • The provider's SBOM and changelog are publicly available, enabling vulnerability tracking for dependent libraries

Public MCP servers accessible without authentication by anyone are not suitable for business use. It is practical to start by limiting usage to only "MCP servers confined within the internal network" or "provider-direct MCP with authentication," and to evaluate and incorporate externally public MCPs incrementally.

Signing, SBOM, and Tamper Detection

Skills and MCP packages should be treated under the assumption that they "can be tampered with, just like container images or npm packages."

  • Signature verification: Confirm that distributions carry the provider's signature (Sigstore / cosign, etc.) and only accept those that pass verification
  • SBOM: Obtain a list of dependent libraries used within the Skill and cross-reference against known CVEs
  • Hash pinning: Save the hash at the time of ingestion, periodically recalculate it, and compare to detect any tampering

The ideal approach is to maintain an "internal mirror that disallows automatic updates from the marketplace and distributes only versions that have passed internal review." An internal mirror may appear to be over-investment, but given observed instances of malicious skills actually circulating in the wild, it is a justified cost for pulling the supply chain boundary back to the enterprise side.

Step 2: Least Privilege and Sandboxing

By minimizing the permissions of commands and APIs invoked by MCP / Skills, and sandboxing agent execution at both the OS and network levels, damage can be contained even if a malicious tool is introduced. The role of Step 2 is to physically limit the blast radius, on the premise that "by design" vulnerabilities permit certain behaviors.

Privilege separation should be designed at both the OS layer and the communication layer.

Privilege Separation and Execution Isolation

The basic principle is to isolate the agent itself, the MCP client, and the MCP server into separate execution contexts.

Isolation LayerRecommended ImplementationScenarios Prevented
UserRun under a dedicated OS accountAccess to individual developer files
ContainerSeparate container per serverLateral movement between containers
NetworkAllow only necessary endpointsArbitrary external API calls
FilesystemRead-only mount + write access limited to working directory onlyDestruction or exfiltration of business files

A setup described as "just running it in Docker" is often insufficiently isolated in practice. Configurations such as running as root inside the container, mounting the host's Docker socket, or sharing /var/run cannot be called a sandbox. The safe approach is to restrict the set of commands a Skill can execute to only allowlisted executables, and to block all other system calls.

SSRF Defense and Egress Control

It has been noted that nearly 40% of MCP servers may carry SSRF vulnerabilities (based on analysis by BlueRock Security). This refers to the phenomenon where MCP / Skills can "issue arbitrary HTTP requests to internal networks or cloud metadata endpoints."

Defense centers on allowlisting egress (outbound) traffic.

  • Fully block communication to the metadata endpoint 169.254.169.254 (also as an additional defense even when IMDSv2 is enforced)
  • Block communication to private IP ranges (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, etc.) except for operationally necessary destinations
  • Restrict external communication to HTTPS only, limited by a domain allowlist
  • Account for DNS rebinding attacks by incorporating re-validation after IP resolution

An MCP / Skill execution environment without egress controls is adjacent to the risk of having AWS IAM role temporary credentials stolen. When operating in the cloud, egress filtering should not be treated as "nice to have" — the absence of it should be regarded as a risk that is already materialized.

Step 3: Input/Output Guards and Auditing

By logging the input prompts and output actions of MCP / Skills and detecting anomalous patterns, you can achieve early attack detection while simultaneously satisfying compliance requirements such as PDPA. If Steps 1 and 2 represent "narrowing the entry points" as a defense, Step 3 is the layer for "detecting intrusions after they occur and fulfilling accountability."

Input/output guards are easier to organize when viewed as an extension of the patterns discussed in Prompt Injection Defense and AI Guardrails Implementation.

Input Sanitization

There are three input pathways for MCP / Skills: (1) direct prompts from users, (2) strings ingested from RAG, databases, or files, and (3) responses from other MCP servers or other agents.

Particular attention should be paid to (2) and (3), which involve "indirect prompt injection," where attack instructions are injected via data without the user's awareness.

Inspection ItemImplementation Pattern
Removal of control characters and zero-width charactersNormalize at ingestion time
Detection of known jailbreak patternsPrompt-based filter + LLM-as-a-Judge
Tool call confirmationHigh-risk operations (deletion, fund transfers, external transmission) require human approval via HITL
Metadata contaminationSeparate source and timestamp into distinct fields

Since "scrutinizing every request with an LLM" is often impractical, a policy of applying heavy inspection only to tool calls involving writes, deletions, or external communication tends to strike the best balance between cost and risk.

Audit Logs and Anomaly Detection

The basic principle for MCP / Skill invocations is to retain structured logs capturing "when, who, which agent, which tool, with which arguments, and what was returned." The following outlines the minimum items to include in logs.

  • Request ID (linkable to the agent session)
  • Calling agent ID and user ID (including approver in the case of HITL)
  • Tool name and parameters (PII masked)
  • Size of return value and external communication destination
  • Whether exceptions or timeouts occurred

Common signals for anomaly detection include: (a) a high volume of tool calls in a short period, (b) egress to domains not normally accessed, (c) large-scale file reads, and (d) external transmission of responses containing personal data. Starting with simple rules that can be integrated into existing SIEM / SOC tools allows for early visibility while keeping initial investment low.

Audit logs serve not only for attack detection, but also play a role in ensuring that access histories for personal data can be presented in response to PDPA requirements or audits. Combined with encryption implementations such as AES-256, this enables both confidentiality at rest and traceability.

Common Failures and Countermeasures

The reality is that the assumptions of "default settings are sufficient" and "it's safe because it's on an internal network" no longer hold for MCP / Skill defenses. The most common failures stem from structural overconfidence.

Here we examine two typical examples that are commonly observed.

The "Local Means Safe" Misconception

The most frequently encountered misconception is the idea that "since it's running on a local machine, it won't reach the outside." The core issue with the vulnerability in Anthropic's MCP reference implementation was that arbitrary commands could be executed on the user's machine even in a local execution environment. The following outlines what can specifically occur.

  • Theft of cookies, passwords, and SSH keys stored in the browser
  • Lateral movement to cloud management consoles (AWS / GCP) reachable from the developer's PC
  • Exfiltration of source code and customer data from other projects stored within the IDE

"Local" should not be treated as "outside the corporate perimeter," but rather as another high-privilege host with access to business systems. Internal guidelines should ensure that PCs running MCP / Skills do not have direct access to production databases; where necessary, switching to a bastion host with short-lived credentials significantly reduces the blast radius.

Development Configurations Leaking into Production

Another common pattern is development convenience settings leaking into production.

  • An unauthenticated MCP server used in development gets left in the production Docker image and deployed as-is
  • The environment variable MCP_ALLOW_ALL=true is set in production as well
  • An API with CORS fully opened for debugging continues to run in production unchanged

These are classic misconfigurations, but in the age of AI agents, the blast radius is far greater. Separating MCP / Skill configurations for development, staging, and production into distinct files, and implementing a mechanism in the CI/CD pipeline to reject unauthorized MCPs for production at build time, is highly effective. Managing configurations as Infrastructure as Code and including them in code review makes issues easier to catch.

FAQ: Where to Start

The priority order for MCP / Skill defense is: inventory current usage → allowlisting → least-privilege enforcement → auditing. Trying to implement everything at once tends to fail. This section answers the two questions most frequently asked in the field.

Alignment with PDPA / Audit Requirements

Q. What additional steps should companies already compliant with Thailand's PDPA take when operating AI agents?

In the context of PDPA, when AI agents handle personal data, four additional requirements apply: (1) explicit statement of processing purpose, (2) access logging, (3) restrictions on cross-border transfers, and (4) handling of data subject requests.

The following measures are practical to implement:

  • Tag MCP / Skill audit logs with "categories of personal data processed"
  • Isolate Skills that handle personal data into a separate group, with distribution rights restricted to personnel who have completed PDPA training
  • For MCPs that involve cross-border transmission (e.g., calls to overseas APIs), combine egress filtering with a prior consent mechanism
  • Set log retention periods long enough to meet business requirements, so that access history disclosure requests from data subjects can be fulfilled

Combining this with the key management patterns covered in Thailand PDPA-Compliant AES-256 Encryption Implementation makes it easier to satisfy audit requirements for both data at rest and data in transit.

Integration Policy with Existing SOC

Q. If an existing SOC / SIEM is already in place, how should AI agent monitoring be integrated?

Rather than standing up a new monitoring infrastructure, the practical approach is to ingest "MCP / Skill calls" as a new data source into the existing SIEM. The following four steps are recommended as a starting point:

  1. Standardize MCP / Skill logs in JSON Lines format and feed them into the existing SIEM ingestion pipeline
  2. Apply existing jailbreak and data exfiltration detection rules to tool_call events as well
  3. Register AI agent sessions as entities equivalent to "user sessions"
  4. Elevate high-risk tool calls (exec, http_post, file deletion, etc.) to a separate alert class with higher priority

Rather than creating a dedicated AI monitoring team, the goal should be to enable the existing SOC to treat AI agents as one new workload among others — an approach that is sustainable both in terms of staffing and operations. Adversarial exercises are covered in the AI Red Teaming Practical Guide, and should be incorporated alongside defensive exercises.

Summary

AI agent supply chain attacks have introduced an attack surface that traditional security measures were never designed to address — brought about by the emergence of MCP and Skill as new distribution channels, and by design choices that Anthropic itself acknowledges as intentional.

Defense should be architected in three layers, not around a single tool:

  • Step 1: Restrict which MCPs / Skills are ingested using an allowlist, and verify the distribution chain with signatures and SBOMs
  • Step 2: Execute within a least-privilege sandbox, with SSRF / egress physically blocked
  • Step 3: Audit-log all calls, and connect them to anomaly detection and PDPA / SOC requirements

Putting everything in place at once is not realistic. Starting with an inventory of current MCP / Skill usage and first narrowing down the use of public MCPs is the highest-value first move.

For related reading, Introduction to AI Agent Protocols (MCP & A2A), AI Guardrails Implementation, AI Red Teaming, and Claude Mythos and Project Glasswing together provide a three-dimensional view of AI agent defense from both the detection and adversarial perspectives.

Author & Supervisor

Yusuke Ishihara

Yusuke Ishihara

Started programming at age 13 with MSX. After graduating from Musashi University, worked on large-scale system development including airline core systems and Japan's first Windows server hosting/VPS infrastructure. Co-founded Site Engine Inc. in 2008. Founded Unimon Inc. in 2010 and Enison Inc. in 2025, leading development of business systems, NLP, and platform solutions. Currently focuses on product development and AI/DX initiatives leveraging generative AI and large language models (LLMs).