Agent delegation creates undocumented dependency chains; failure cascades through systems with no documented dependency relationship; blast radius exceeds disaster recovery planning.
In traditional infrastructure, dependencies are documented and modeled. System A depends on System B; System B depends on Database C. Disaster recovery planning is based on this dependency map. When agents dynamically invoke APIs and delegate to other systems through function calling or message queues, they create dependencies that may not be explicitly documented in the infrastructure's dependency map.
When one system in the chain fails, the agent may be unaware of the failure (if it is not immediately returned as an error code). The agent continues operating, queuing requests that accumulate. Dependent systems waiting for responses stall. The failure cascades beyond what the infrastructure team anticipated. Disaster recovery planning, which is based on documented dependencies, does not account for the undocumented chains created by agents. The organization discovers, during a failure event, that the actual blast radius is much larger than the documented blast radius.
A major insurance company operates a claims processing platform built on microservices. The infrastructure team has documented these dependencies and created a disaster recovery plan. The company deploys an agentic workflow manager to accelerate claims throughput. The agent is authorized to invoke the claims entry API, then the claims validation API, then conditionally the medical review API, then the payment API. Additionally, the agent invokes a third-party data enrichment API and writes decision logs to an external compliance archive.
One morning, the third-party data enrichment API experiences an outage. The agent retries 3 times, consuming 15 seconds. After the third retry, the agent proceeds to the validation step, which requires enrichment data. The validation fails. The agent retries the entire sequence, creating a loop. Meanwhile, the agent's decision log writes are queuing up to the compliance archive, which receives thousands of logs per minute and becomes unresponsive.
The archive failure cascades to the customer communication service, which depends on the archive for audit trail verification. The communication service stops sending notifications. The infrastructure team's disaster recovery plan does not mention the enrichment API or the communication service as dependencies of the claims platform. The incident requires emergency escalation to identify and stop the agent, rebuild the compliance archive, and manually notify customers. The incident duration is 2 hours, far exceeding the organization's RTO of 15 minutes.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 5 | Cascading failures are difficult to detect because the root cause (agent-induced load on an external system) is indirect and not monitored. Standard infrastructure monitoring may not correlate the agent's actions with the cascade. |
| A - Autonomy Sensitivity | 5 | The risk manifests in autonomous agents that create dynamic dependencies through function calling or delegation. |
| M - Multiplicative Potential | 5 | A single agent error (retrying a degraded API) can cascade across multiple systems exponentially. |
| A - Attack Surface | 5 | Any agent with access to multiple APIs or systems can create the undocumented dependency chain. |
| G - Governance Gap | 5 | Standard infrastructure dependency mapping does not account for agent-dynamic dependencies. DR planning is based on static dependencies and does not account for agent-induced chains. |
| E - Enterprise Impact | 5 | Cascading failure can affect all customer-facing services and require extended outages to remediate. |
| Composite DAMAGE Score | 4.2 | Critical. Requires immediate architectural controls. Cannot be accepted. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Low | Human operator sequences tool invocations and may avoid cascade-prone patterns. |
| Digital Apprentice | Medium | Limited autonomy; cascade chains are shorter. |
| Autonomous Agent | High | Autonomous tool invocation creates dynamic dependency chains. |
| Delegating Agent | Critical | Function calling and dynamic API invocation are core to the design; cascades are likely. |
| Agent Crew / Pipeline | Critical | Multiple agents in sequence can create compound dependency chains. |
| Agent Mesh / Swarm | Critical | Peer-to-peer delegation creates unpredictable dependency topologies. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| DORA Article 17 | Relevant | Operational Resilience | Scenario analysis; testing of cascading failures. | Agent-induced cascades distinct from infrastructure failures. |
| FFIEC Business Continuity | Relevant | Dependency Mapping | Dependency mapping; recovery planning. | Agent-dynamic dependencies not captured in static maps. |
| MAS TRM Guidelines | Relevant | Technology Risk Management | Resilience testing; failure scenario analysis. | Agent-centric cascading failures. |
| GLBA Section 501 | Relevant | Safeguarding Rule | Security controls; operational continuity. | Agent-induced operational failures. |
| ISO 42001 | Minimal | Section 8.1 | AI system governance; planning. | Agent-induced dependency chains and cascading failures. |
| NIST CSF 2.0 | Partial | Govern and Protect Functions | Dependency identification; failure recovery. | Agent-dynamic dependency chains. |
Regulators expect institutions to understand their operational infrastructure and to plan for failure scenarios. When a cascading failure occurs that was not anticipated in the DR plan, regulators ask whether the organization understood the true dependencies in its systems. If an agent created an undocumented dependency chain that the organization did not know about, regulators cite this as a governance failure.
The regulatory consequence is that the organization must revise its dependency mapping, revise its DR plan, and possibly revise its operational risk capital allocation. If the cascade caused customer impact or service disruption beyond the documented RTO, regulators may impose remedial action orders or increased capital requirements.
In financial services, cascading failures can affect settlement, clearing, and payment systems, which are critical to the broader financial system. Regulators (Federal Reserve, OCC) take infrastructure resilience seriously and expect institutions to be able to articulate the recovery scenario for every system in their operational footprint.
Cascading Infrastructure Failure requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing