A single agent interaction that triggers a reasoning loop can consume thousands of dollars in API costs in minutes. Cost monitoring operates on billing cycles; cost runaway operates on seconds.
Model APIs charge by the number of tokens processed (input and output tokens). An agent with recursive reasoning, multi-step delegation, or iterative refinement may consume many tokens per request. Cost is typically tracked on monthly billing cycles. An agent that causes token consumption to spike due to unexpected recursive reasoning or delegation loops may consume weeks' worth of budget in hours.
This creates a cost runaway risk: an agent that is supposed to cost $100/month due to anticipated usage patterns may cost $10,000/month if reasoning becomes unexpectedly recursive or if a prompt change causes the agent to delegate to multiple sub-agents per request. The institution discovers the cost overrun only when the monthly bill arrives.
The runaway risk is amplified by non-transparency: an institution may not know which agent is consuming the most tokens because token consumption is not tracked per-agent in real-time. Token consumption is reported at the API level, not at the application level.
A bank's AML team deploys an agent to perform automated SAR (suspicious activity report) generation. The agent is expected to process 1,000 transactions per day, costing approximately $5,000/month in tokens. The institution budgets $5,000/month for the agent's usage.
Over several weeks, new requirements are added: the agent should now cross-reference transaction counterparties against an expanded sanctions watch list, query additional external data sources, and provide more detailed narrative explanation. Each requirement adds complexity and token consumption. The agent's reasoning becomes more recursive as it evaluates multiple data sources and refines its analysis.
After three months, the first bill arrives: $35,000 instead of $15,000 (expected 3-month cost). The agent's token consumption has increased 7x. The institution investigates and discovers the agent is now consuming approximately 100 tokens per transaction instead of the originally anticipated 15 tokens. The agent's recursive reasoning and additional data source queries caused the explosion.
The institution must either reduce the agent's functionality to bring costs back in line, or accept the new cost baseline. Either way, the institution has been surprised by cost overrun and lacks mechanisms to detect and control it.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 2 | Cost overrun is detected only when bills arrive. Real-time detection requires explicit per-agent cost monitoring. |
| A - Autonomy Sensitivity | 2 | Cost issues affect all autonomy levels; structural to token-based pricing. |
| M - Multiplicative Potential | 4 | Each additional agent, each recursive reasoning step, each delegation multiplies token consumption. |
| A - Attack Surface | 2 | Adversary could intentionally cause cost runaway, but runaway occurs naturally through feature expansion. |
| G - Governance Gap | 4 | Cost governance assumes costs track linearly with usage. Recursive reasoning causes superlinear cost scaling. |
| E - Enterprise Impact | 2 | Cost overruns are undesirable but typically manageable. Impact is financial, not safety-critical. |
| Composite DAMAGE Score | 3.5 | High. Requires priority attention and dedicated controls. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Low | Human user may notice cost implications if explicitly aware of pricing. |
| Digital Apprentice | Moderate | Progressive autonomy increases token consumption frequency. |
| Autonomous Agent | High | Autonomous agent consumes tokens continuously without human awareness of cost implications. |
| Delegating Agent | Critical | Agent delegates frequently; each delegation consumes tokens. Recursive delegation causes exponential token consumption. |
| Agent Crew / Pipeline | Critical | Multiple agents in pipeline, each consuming tokens. Total cost is multiplicative. |
| Agent Mesh / Swarm | Critical | Peer-to-peer agent network with dynamic delegation. Token consumption is unpredictable and potentially exponential. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| NIST CSF 2.0 | Partial | GOVERN (Resource Mgmt) | Addresses resource management and cost governance. | Does not specifically address AI token consumption. |
| BCBS 239 | Minimal | Operational Risk | General operational risk governance. | Does not address AI cost management. |
| EU AI Act | Minimal | General governance | General governance principles. | Does not address cost management. |
| MAS AIRG | Minimal | General governance | General governance principles. | Does not address cost monitoring. |
Cost control is an operational governance requirement. An institution that cannot control or predict its costs for critical systems violates operational risk management principles. Regulators expect institutions to have cost budgets, cost tracking, and cost controls for all material expenses. An institution that experiences 7x cost overruns without detection or control demonstrates weak cost governance.
Additionally, token costs are directly correlated with operational risk. The more tokens an agent consumes, the longer it takes to process requests, the more risk of cost escalation, the more risk of service degradation if costs become prohibitive.
Token Economics and Cost Runaway requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing