R-OR-04 Operational Resilience DAMAGE 3.8 / High

Resource Exhaustion and Runaway Loops

Agents create recursive delegation loops, each invocation appears as a legitimate, independent request. The loop consumes compute until infrastructure fails.

The Risk

Autonomous agents can invoke tools and APIs, and in some architectures, agents can delegate to other agents through APIs or message queues. When delegation patterns are not carefully constrained, an agent can initiate a task that invokes another agent, which invokes a third agent, which invokes the first agent again, creating a loop. Each individual invocation is legitimate: Agent A calls Agent B; Agent B calls Agent C; Agent C calls Agent A. Each call uses correct credentials and targets the correct API. But the aggregate pattern is a resource-exhaustion attack.

Standard rate limiting operates per-caller. If Agent A makes 100 calls per minute, the system flags it as an anomaly. But if Agent A makes 10 calls per minute to Agent B, Agent B makes 10 calls per minute to Agent C, and Agent C makes 10 calls per minute back to Agent A, the system sees three actors each operating within their rate limit. Runaway loops can also occur without explicit delegation: an agent may invoke a tool, receive an error, decide to retry, and loop indefinitely if the retry logic does not have a termination condition.

How It Materializes

A financial services firm deploys three customer service agents to handle inquiries: Agent A handles account inquiries, Agent B handles transaction inquiries, and Agent C handles dispute resolution. Each agent is authorized to call the others when a customer's inquiry spans multiple domains. The agents are configured with a "delegate to peer" capability.

A customer contacts Agent A with an inquiry spanning all three domains: "I made a payment but I am not sure if it posted. The receiving bank might be disputing the amount." Agent A gathers account information, then delegates to Agent B for transaction details. Agent B retrieves transaction data and identifies the transaction is in dispute status, so it delegates to Agent C. Agent C retrieves dispute data and finds the dispute is flagged as a potential account-level issue requiring account investigation, so it delegates back to Agent A.

The loop is now in motion. Each agent is performing legitimate work (querying APIs, retrieving data), but the delegation chain has created a cycle. After 5 cycles, there have been 15 API calls. After 10 cycles, 30 calls. The transaction ledger API and the dispute API are both saturated. Other customers' inquiries are delayed or dropped. The firm's infrastructure team detects a resource exhaustion event. Load on the transaction API spikes 500%. The incident requires post-mortem analysis and rollback of the agent deployment.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 3 Circular delegation patterns are detectable through call graph analysis, but this requires explicit monitoring. Standard rate limits per-agent do not flag circular patterns.
A - Autonomy Sensitivity 5 The risk manifests only in agents with autonomous delegation authority. Agents that require human approval for each delegation do not present this risk.
M - Multiplicative Potential 4 Each cycle of the loop creates exponential load. A 3-agent loop doubling each cycle can overwhelm infrastructure in 10 to 15 cycles.
A - Attack Surface 4 Any agent with delegation authority or recursive tool invocation capability can create loops. As agent-to-agent APIs become common, the surface expands.
G - Governance Gap 5 Standard operational resilience controls (rate limiting, circuit breakers) operate per-actor and do not account for peer-to-peer delegation. Agent governance frameworks do not mandate call graph analysis or loop detection.
E - Enterprise Impact 4 Resource exhaustion can degrade service for all customers, create SLA violations, and require manual intervention to resolve.
Composite DAMAGE Score 3.8 High. Requires dedicated controls and monitoring. Should not be accepted without mitigations.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type Impact How This Risk Manifests
Digital Assistant Low Human-in-the-loop prevents autonomous looping.
Digital Apprentice Low Limited delegation authority; loops confined to narrow scope.
Autonomous Agent High Can delegate autonomously; loops are possible.
Delegating Agent Critical Specifically designed for tool and API invocation; recursive invocation patterns are likely.
Agent Crew / Pipeline Critical Multiple agents in structured workflow; if any can delegate back to prior agents, loops are created.
Agent Mesh / Swarm Critical Peer-to-peer delegation is the defining characteristic; loops are a primary failure mode.

Regulatory Framework Mapping

Framework Coverage Citation What It Addresses What It Misses
DORA Article 17 Relevant Operational Resilience Resilience to operational disruptions; capacity and scalability. Agent-induced resource exhaustion distinct from infrastructure failure.
FFIEC IT Handbook Partial Capacity Planning Capacity planning; performance monitoring. Agent-induced load spikes not accounted for in capacity models.
MAS TRM Guidelines Partial Technology Risk Management Resilience testing; failure scenario planning. Agent-induced runaway loops.
NIST CSF 2.0 Partial Govern Function Risk assessment; prioritization. Agent-specific runaway loop risks.
OWASP Agentic Top 10 Relevant A05: Unbounded Consumption Resource exhaustion by agents; infinite loops. Specific to peer-to-peer agent delegation loops.
ISO 42001 Minimal Section 8.5 AI system performance; resource utilization. Agent-induced resource exhaustion and loop detection.

Why This Matters in Regulated Industries

Operational resilience is a non-negotiable requirement in financial services. Regulators expect institutions to operate service continuity plans, capacity planning, and incident response procedures. When an agent causes a resource exhaustion event, the institution must demonstrate that this was an unforeseen failure mode (and therefore not a control weakness) rather than a foreseeable operational risk that should have been designed out.

In practice, regulatory scrutiny of agent-induced resource exhaustion will focus on whether the institution had explicit controls to prevent or detect runaway loops. The institution cannot claim that per-agent rate limiting was sufficient if it did not implement call graph monitoring or loop detection. Regulators are likely to ask: "Did you know your agents could delegate to each other? Did you analyze the delegation topology for loops before deployment?"

The operational impact is also significant. Resource exhaustion events degrade customer service, trigger incident response costs, and can cause SLA violations. If the service degradation affects critical functions (e.g., payment processing, account access), regulators may impose operational risk capital charges.

Controls & Mitigations

Design-Time Controls

  • Before deploying any agent with delegation authority, map the delegation topology (which agents can delegate to which agents and APIs). Analyze the graph for cycles. Any cycles must be explicitly justified and guarded with termination conditions.
  • Implement a "call depth" constraint: any delegation chain deeper than N levels (recommend N=3) is automatically terminated. The agent must resolve the request at the current depth or escalate to human review.
  • Require that all agent-to-agent communication use a message queue or event bus with explicit per-queue rate limits, separate from per-agent rate limits.

Runtime Controls

  • Deploy a call graph monitor that tracks all delegation calls in real-time. When the monitor detects a cycle (Agent A to Agent B to Agent C to Agent A), it immediately terminates all calls in the cycle and escalates to human review.
  • Implement per-delegation-chain rate limits in addition to per-agent rate limits. A single delegation chain is bounded to a maximum number of calls within a time window.
  • Use the Blast Radius Calculator (Component 4) to estimate the potential resource consumption of a delegation chain before permitting it.

Detection & Response

  • Monitor API and infrastructure metrics for anomalous load patterns. A sudden 10x increase in calls to any API should trigger an alert and automatic investigation.
  • Implement call chain logging that tracks the delegation path for every API call. When a resource exhaustion event occurs, the logs show exactly which agents created the load.
  • Establish an "emergency cutoff" procedure: if a resource exhaustion event is detected, the system should immediately kill all agents matching the delegation pattern and hold them in quarantine.

Related Risks

Address This Risk in Your Institution

Resource Exhaustion and Runaway Loops requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing