Every agent deployed in a regulated financial institution interacts with systems that already have governance, controls, and risk owners. This article maps where those interactions create risk.
Every agent deployed in a regulated financial institution interacts with systems that already have governance, controls, and risk owners. The agent does not operate in a vacuum. It queries ML models, writes to transaction systems, navigates workflow engines, reads from data warehouses, communicates through protocols like A2A and MCP, authenticates through identity systems, and interacts with humans through chat channels and email.
At each of these touchpoints, the agent meets a system that was designed for human users or deterministic software. The controls governing that system assume structured inputs, predictable behavior, and accountable callers. Agents violate all three assumptions. They send unstructured requests, behave non-deterministically, and their accountability is undefined.
This article maps the seven infrastructure touchpoints where agents interact with institutional systems, identifies the existing control framework at each point, names the risk owner, and explains what changes when an agent is in the loop. Each section links to specific risks in the Agentic AI Risk Catalog for Highly-Regulated Industries.
Traditional model consumers are applications with defined interfaces: a credit scoring API receives structured input and returns a score. The application's use of the score is deterministic and auditable. The model validation team knows exactly how the model's output enters the decision.
An agent consumes model outputs through natural language reasoning. It calls "the credit model" without necessarily specifying a version. It treats the returned score not as an estimate with confidence intervals but as a fact to be woven into prose. The uncertainty inherent in the model's output evaporates when the agent writes: "The applicant's credit risk is moderate, supporting approval at the requested amount." A human reading that sentence has no visibility into the model's confidence interval, the feature values that drove the score, or whether the model version has changed since validation.
The agent may also consume outputs from multiple models in a single reasoning pass, creating compound model risk that no individual model validation covers. A pricing agent that combines a credit risk score, a market risk estimate, and a liquidity model output is operating at the intersection of three models. The compound risk of combining these outputs is unowned: MRM validated each model individually, and no one validated the agent's reasoning over their combined outputs.
Agents can also contaminate the data that feeds model retraining. When agent outputs are written to production data stores, they enter the pipeline that future models learn from. If the agent produced errors, those errors become training signal. The model learns from the agent's mistakes and the agent consumes the degraded model's outputs in a feedback loop that neither MRM nor agent governance is designed to detect.
Transaction systems were built for clients that follow protocol: submit a request, wait for confirmation, handle errors through defined exception paths. Every transaction client the system has ever processed is either a human using a UI (with built-in constraints like confirmation dialogs and session timeouts) or a service with deterministic behavior (batch jobs, scheduled payments, API integrations with retry logic).
An agent is neither. It interacts with transaction systems through tool calls mediated by natural language reasoning. If the agent loses context mid-transaction (timeout, context window overflow, delegation to another agent), it may not complete the transaction, may not roll back, and may not even know the transaction was left in an incomplete state. The result is partial commits, orphaned holds, and duplicate submissions that transaction monitoring was not designed to detect because no prior client behaved this way.
More subtly, an agent with delegated authority can both initiate and approve transactions if the approval system validates identity rather than independence. The approval chain is structurally intact: there is an initiator and an approver. But both are the same agent operating with the same delegated authority. Separation of duties collapses.
In trading contexts, agent non-determinism creates a specific risk. The same market data, processed by the same agent with the same prompt, can produce different trading signals on successive runs. If the agent is operating with any degree of autonomy over order execution, this non-determinism violates the consistency assumptions that market surveillance systems depend on for detecting manipulation patterns.
Workflow engines maintain state machines: a case is in "pending review," transitions to "approved" or "rejected," and cannot skip steps. Every workflow participant the engine has processed is a human who interacts through a UI that constrains available actions to valid transitions, or a service integration that follows the engine's API contract.
An agent interacts with workflow engines through API calls that may not respect state machine constraints. An agent can advance a case past a required review step if the API permits it, create parallel branches that the workflow engine treats as valid but that violate the process designer's intent, or leave workflow instances in states that have no defined transition.
The deeper problem is that agents interact with workflows at a speed and scale that human-designed processes were not built for. A human case manager processes a case in 30 minutes and moves to the next. An agent can process hundreds of cases per hour. If the agent has a systematic error in its reasoning, the error is replicated across hundreds of workflow instances before anyone notices. The workflow engine's SLA monitoring shows green because cases are being processed faster than ever. The quality of the processing is invisible to the workflow metrics.
Agents also struggle with the informal coordination that makes workflows actually work. In practice, complex cases involve sidebar conversations, judgment calls, and contextual decisions that workflow state machines do not capture. A human reviewer knows to call the relationship manager before rejecting a long-standing client's application. An agent follows the workflow's formal definition and rejects without the call. The workflow completed correctly by its own metrics. The business outcome was wrong.
Data infrastructure was built for structured consumers: BI tools that run SQL queries, ETL jobs that transform data through defined pipelines, applications that read and write through APIs with defined schemas. Every data consumer the infrastructure has processed follows a pattern that data governance can trace: the query came from this application, transformed data through this pipeline, and wrote results to this table.
An agent breaks this pattern at every level. It consumes data through natural language interfaces where the "query" is a prompt and the "transformation" is generative reasoning. Data lineage tracking has nothing to trace because there is no structured query, no ETL job, no defined transformation. The agent ingests data from three sources into a context window, reasons generatively, and writes output whose relationship to any specific input cannot be decomposed.
Data classification controls fail because they operate at the system boundary (database ACLs, network segmentation, DLP), not inside the reasoning process. When an agent combines public market data with restricted customer records in a single reasoning pass, the output is a derivative of both classification tiers. No system-level control detects this because the commingling happens inside the agent, not at a data boundary.
Agents also create ungoverned data replicas. Tool workspaces, vector database embeddings, context caches, and intermediate stores all contain copies of institutional data that exist outside the CDO's governance perimeter. Retention policies, access controls, and deletion obligations do not extend to these replicas because the data governance framework does not know they exist.
Communication infrastructure was built for services with fixed interfaces: Service A calls Service B at a known endpoint with a defined schema. The API gateway validates the request, the service mesh handles routing and retry, and OAuth ensures the caller is authorized. Every participant in the communication is registered, versioned, and monitored.
Agent communication protocols introduce dynamic discovery and runtime capability acquisition. A2A lets agents find other agents through Agent Cards (JSON metadata describing capabilities). MCP lets agents connect to tool servers that expose capabilities and resources. Skills and plugins let agents install new capabilities from registries. None of these mechanisms were anticipated by existing API governance.
The trust model breaks in specific ways. An A2A Agent Card declares what an agent can do, but the card is metadata, not a cryptographic guarantee. A spoofed card causes other agents to delegate sensitive tasks to an imposter. An MCP server exposes tools and resources, but the agent that connects to it trusts everything the server advertises. A compromised MCP server becomes a tool injection vector. A skill installed from a marketplace is executable code that runs with the agent's permissions, bypassing the institution's CI/CD pipeline and change management controls.
Cross-organizational delegation is the most consequential gap. When a bank's agent delegates a KYC check to a third-party agent via A2A, the delegating agent trusts the receiving agent's output without visibility into its reasoning, data handling, or compliance posture. This creates a dynamic third-party relationship that the institution's TPRM program does not cover because no contract event triggered the assessment.
IAM was designed for two types of callers: humans who authenticate through credentials and MFA, and services that authenticate through API keys, certificates, or OAuth tokens. Access is granted based on identity, role, and context. The system assumes that the permissions granted to an identity are the permissions that identity exercises.
Agents break this assumption through delegation and tool connectivity. When a high-privilege user invokes a low-privilege agent, the agent may inherit the user's session context, including permissions the agent was never designed to exercise. The agent's identity in the IAM system says "read-only reporting agent." Its effective permissions at runtime say "whatever the invoking user can do." IAM tracks the agent's registered identity, not its cumulative operational authority.
Secrets management faces a parallel challenge. Agents receive credentials to invoke tools and APIs. These credentials enter the agent's context window, may appear in logs, can be passed to downstream agents through delegation, and may persist in conversation history. Existing secrets rotation and monitoring track credential access at the vault level. They do not track credential exposure through agent reasoning and context propagation.
Cross-system identity fragmentation means an agent verified in one platform (authenticated, authorized, audited) is anonymous in another. When the agent invokes a tool in a different system, its identity does not propagate. The downstream system sees an API call from a service account, not from a specific agent with a specific mission operating under specific constraints. Audit trails break at system boundaries.
Human communication channels were designed for humans. The governance frameworks surrounding these channels assume human participants: communication monitoring screens for human behaviors (insider trading language, market manipulation signals, conduct risk indicators), records retention captures communications between identified humans, and acceptable use policies govern how employees communicate.
Agents operating in these channels introduce three distinct problems. First, agents can be indistinguishable from human participants. A message posted by an agent in a Slack channel looks the same as a message posted by a colleague. If the institution has no policy requiring agents to identify themselves, humans interact with agent outputs as if they were human opinions, giving them credibility and weight they may not deserve.
Second, agents in human channels become social engineering targets and vectors. An agent that reads and responds to messages can be manipulated through crafted interactions that exploit its prompt-following nature. The same agent can be weaponized to deliver social engineering attacks to humans: crafting persuasive requests, citing fabricated authority, or creating artificial urgency. Users' trained skepticism toward phishing emails does not extend to messages from an internal agent in a trusted channel.
Third, communication surveillance systems designed to detect human misconduct produce false signals or miss genuine risks when agents are participants. An agent that generates a large volume of messages about trading activity may trigger surveillance alerts designed for humans, wasting compliance resources. Conversely, an agent that facilitates information barrier breaches through its outputs in a chat channel may not trigger any alert because the surveillance system is looking for human language patterns, not agent-mediated information leakage.
This map reveals a structural problem that no single risk framework, risk owner, or governance committee can solve alone. Agents touch ML models (owned by MRM), transaction systems (owned by Operations), workflows (owned by Process Owners), data infrastructure (owned by CDO), communication protocols (owned by Platform Engineering), identity systems (owned by CISO), and human channels (owned by Compliance). Each risk owner governs their system. None governs the agent that crosses all of them.
The 133 risks in the Agentic AI Risk Catalog for Highly-Regulated Industries are not abstract possibilities. They materialize at these seven touchpoints, in the specific systems your institution already operates, under the specific governance frameworks your teams already maintain. The question is not whether your institution has controls. It is whether those controls function when the caller is an autonomous agent rather than a human or a deterministic service.
Mapping your agent deployments to these touchpoints, identifying which risk owners need to be in the room, and assessing which existing controls will and will not function under agentic processing is the first step toward governance that works in practice, not just on paper.
Identify where your agents touch institutional infrastructure, which controls hold, and which need augmentation.
Schedule a Briefing