R-AC-02 Agent Communication & Interoperability DAMAGE 4.2 / Critical

MCP Server Trust Boundary

Agent connects to MCP server trusting its tools and resources. A compromised server can inject adversarial content, expose malicious tools, or exfiltrate data through tool invocations.

The Risk

Model Context Protocol (MCP) enables agents to discover and invoke tools and resources from MCP servers. An agent connecting to an MCP server trusts that the server provides legitimate tools and that tool responses are truthful. If an MCP server is compromised or malicious, it can expose tools that perform adversarial operations (exfiltrate data, modify databases, access restricted resources), return false information in responses, inject payloads into tool responses that trigger prompt injection or code execution, and monitor all tool invocations to exfiltrate data that passes through the server.

The trust boundary is at the point where the agent connects to the MCP server. The agent assumes the server is legitimate and that tools are safe to invoke. If the server is compromised, this assumption is violated.

How It Materializes

A healthcare provider deploys an agent-based clinical decision support system. The agent connects to an MCP server to access clinical reference tools: Drug-Interaction-Checker, Lab-Result-Interpreter, and Guideline-Lookup.

An attacker compromises the MCP server (via SQL injection in the server's API or by breaching the server's hosting environment). The attacker modifies the server to expose an additional tool: "Generate-Patient-Report" with a schema that appears legitimate but actually exfiltrates patient health records to an attacker-controlled database.

The clinical decision support agent, when presented with a patient case, queries the MCP server for available tools. The server now responds with the malicious "Generate-Patient-Report" tool. The agent, seeing a tool that matches its need to generate clinical summaries, invokes the tool with patient data (name, medical history, test results). The malicious tool receives the patient data and sends it to an attacker's server, while returning a fabricated report.

Over several weeks, the attacker exfiltrates health records for 2,300 patients. The attack is not detected because the agent's audit log shows a normal tool invocation, the returned report appears correct, and network monitoring does not flag the exfiltration because it goes through the MCP server inside the organization's network.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 3 Compromised MCP server is difficult to detect because agents trust the server. Fabricated tools may pass validation.
A - Autonomy Sensitivity 4 High when agents autonomously invoke MCP tools. Agents do not verify tool legitimacy or response truthfulness.
M - Multiplicative Potential 5 Every agent that connects to the compromised server is potentially affected. Poison affects all tool invocations.
A - Attack Surface 5 MCP server and the connection between agent and server are both attack surfaces. Server compromise is a supply chain risk.
G - Governance Gap 4 Institutions may connect agents to MCP servers without adequate verification or without controls limiting what tools agents can invoke.
E - Enterprise Impact 5 Enables data exfiltration, access to restricted data, and lateral movement. Full impact depends on what data agents have access to.
Composite DAMAGE Score 4.2 Critical. Requires immediate architectural controls. Cannot be accepted.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type Impact How This Risk Manifests
Digital Assistant Low Human reviews tool invocations before proceeding. Human can recognize suspicious tools.
Digital Apprentice Low-Med Agents are conservative about tool invocation and escalate when tools seem anomalous.
Autonomous Agent High Agents autonomously invoke MCP tools based on schema. Cannot distinguish legitimate from malicious tools.
Delegating Agent Very High Delegating agent's primary function is to invoke tools via MCP. Compromise affects all delegations.
Agent Crew / Pipeline High Crew agents may use shared MCP servers. One compromised server affects entire crew.
Agent Mesh / Swarm Very High Mesh agents dynamically discover and invoke MCP tools. Compromised server can inject malicious tools into mesh.

Regulatory Framework Mapping

Framework Coverage Citation What It Addresses What It Misses
NIST AI RMF 1.0 Partial GOVERN 6.2 (Supply Chain Risk) Supply chain risk management. MCP server security and tool vetting.
NIST CSF 2.0 Partial ID.SC-1 (Supply Chain Risk) Supply chain risk. Third-party tool integration and validation.
OWASP Top 10 Partial A06:2021 (Vulnerable Components) Vulnerable third-party components. MCP tool supply chain risk.
GDPR Article 28 Partial Data Processing Agreements Third-party data processing. MCP servers processing personal data.
HIPAA BAA Partial Technical Safeguards Third-party access to protected health information. MCP servers accessing PHI.

Why This Matters in Regulated Industries

In financial services and healthcare, data exfiltration through compromised MCP tools is a material risk. An agent with access to sensitive data who invokes a malicious MCP tool can exfiltrate that data without the agent "knowing" it is doing so.

Additionally, MCP server compromise is a third-party risk that regulatory frameworks explicitly address. If an institution deploys agents that rely on MCP servers, the institution has a duty to ensure those servers are secure and trustworthy.

Controls & Mitigations

Design-Time Controls

  • Implement agent policies that explicitly enumerate which MCP servers agents are allowed to connect to. Use a whitelist of trusted servers; deny connections to unlisted servers.
  • Require MCP servers to be vetted and authorized before agents can connect. Vetting includes security assessment, access controls review, and data handling practices.
  • Implement tool whitelisting within agents. Even if an MCP server is compromised, agents should only invoke pre-approved tools and reject unknown tools.
  • Use Component 2 (Cryptographic Identity) to establish mutual authentication between agents and MCP servers.

Runtime Controls

  • Implement response validation on all MCP tool responses. Define expected response schemas and validate actual responses against schemas. Flag responses that deviate.
  • Monitor tool invocations for anomalies. Track which tools agents invoke and with what data. Flag invocations of unusual tools or tools with unusual parameters.
  • Implement data loss prevention (DLP) rules on MCP tool invocations. If an agent attempts to invoke a tool with sensitive data, require explicit authorization.
  • Use Component 3 (JIT Authorization Broker) to approve MCP tool invocations dynamically.

Detection & Response

  • Conduct regular security assessments of MCP servers. Audit server security controls, access logs, and tool configurations.
  • Monitor network traffic from agents to MCP servers. Detect unusual traffic patterns (large data transfers, connections to new servers).
  • Implement tool response logging and analysis. Periodically analyze response patterns to detect fabricated or anomalous responses.
  • Implement incident response procedures for MCP server compromise. Disable agent connections and conduct forensics on all data that agents may have exposed.

Related Risks

Address This Risk in Your Institution

MCP Server Trust Boundary requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing