R-FM-08 Foundation Model & LLM DAMAGE 3.5 / High

Token Economics and Cost Runaway

A single agent interaction that triggers a reasoning loop can consume thousands of dollars in API costs in minutes. Cost monitoring operates on billing cycles; cost runaway operates on seconds.

The Risk

Model APIs charge by the number of tokens processed (input and output tokens). An agent with recursive reasoning, multi-step delegation, or iterative refinement may consume many tokens per request. Cost is typically tracked on monthly billing cycles. An agent that causes token consumption to spike due to unexpected recursive reasoning or delegation loops may consume weeks' worth of budget in hours.

This creates a cost runaway risk: an agent that is supposed to cost $100/month due to anticipated usage patterns may cost $10,000/month if reasoning becomes unexpectedly recursive or if a prompt change causes the agent to delegate to multiple sub-agents per request. The institution discovers the cost overrun only when the monthly bill arrives.

The runaway risk is amplified by non-transparency: an institution may not know which agent is consuming the most tokens because token consumption is not tracked per-agent in real-time. Token consumption is reported at the API level, not at the application level.

How It Materializes

A bank's AML team deploys an agent to perform automated SAR (suspicious activity report) generation. The agent is expected to process 1,000 transactions per day, costing approximately $5,000/month in tokens. The institution budgets $5,000/month for the agent's usage.

Over several weeks, new requirements are added: the agent should now cross-reference transaction counterparties against an expanded sanctions watch list, query additional external data sources, and provide more detailed narrative explanation. Each requirement adds complexity and token consumption. The agent's reasoning becomes more recursive as it evaluates multiple data sources and refines its analysis.

After three months, the first bill arrives: $35,000 instead of $15,000 (expected 3-month cost). The agent's token consumption has increased 7x. The institution investigates and discovers the agent is now consuming approximately 100 tokens per transaction instead of the originally anticipated 15 tokens. The agent's recursive reasoning and additional data source queries caused the explosion.

The institution must either reduce the agent's functionality to bring costs back in line, or accept the new cost baseline. Either way, the institution has been surprised by cost overrun and lacks mechanisms to detect and control it.

DAMAGE Score Breakdown

DimensionScoreRationale
D - Detectability2Cost overrun is detected only when bills arrive. Real-time detection requires explicit per-agent cost monitoring.
A - Autonomy Sensitivity2Cost issues affect all autonomy levels; structural to token-based pricing.
M - Multiplicative Potential4Each additional agent, each recursive reasoning step, each delegation multiplies token consumption.
A - Attack Surface2Adversary could intentionally cause cost runaway, but runaway occurs naturally through feature expansion.
G - Governance Gap4Cost governance assumes costs track linearly with usage. Recursive reasoning causes superlinear cost scaling.
E - Enterprise Impact2Cost overruns are undesirable but typically manageable. Impact is financial, not safety-critical.
Composite DAMAGE Score3.5High. Requires priority attention and dedicated controls.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent TypeImpactHow This Risk Manifests
Digital AssistantLowHuman user may notice cost implications if explicitly aware of pricing.
Digital ApprenticeModerateProgressive autonomy increases token consumption frequency.
Autonomous AgentHighAutonomous agent consumes tokens continuously without human awareness of cost implications.
Delegating AgentCriticalAgent delegates frequently; each delegation consumes tokens. Recursive delegation causes exponential token consumption.
Agent Crew / PipelineCriticalMultiple agents in pipeline, each consuming tokens. Total cost is multiplicative.
Agent Mesh / SwarmCriticalPeer-to-peer agent network with dynamic delegation. Token consumption is unpredictable and potentially exponential.

Regulatory Framework Mapping

FrameworkCoverageCitationWhat It AddressesWhat It Misses
NIST CSF 2.0PartialGOVERN (Resource Mgmt)Addresses resource management and cost governance.Does not specifically address AI token consumption.
BCBS 239MinimalOperational RiskGeneral operational risk governance.Does not address AI cost management.
EU AI ActMinimalGeneral governanceGeneral governance principles.Does not address cost management.
MAS AIRGMinimalGeneral governanceGeneral governance principles.Does not address cost monitoring.

Why This Matters in Regulated Industries

Cost control is an operational governance requirement. An institution that cannot control or predict its costs for critical systems violates operational risk management principles. Regulators expect institutions to have cost budgets, cost tracking, and cost controls for all material expenses. An institution that experiences 7x cost overruns without detection or control demonstrates weak cost governance.

Additionally, token costs are directly correlated with operational risk. The more tokens an agent consumes, the longer it takes to process requests, the more risk of cost escalation, the more risk of service degradation if costs become prohibitive.

Controls & Mitigations

Design-Time Controls

  • Estimate token consumption per agent: for each agent use case, estimate input and output token consumption per request. Calculate expected monthly costs.
  • Implement token budgets: set monthly token budgets for each agent or agent group. Define budget alert thresholds (e.g., alert at 50%, 75%, 90%).
  • Design agents to minimize token consumption: use techniques like caching, summarization, and data filtering to reduce context size.
  • Implement cost-aware reasoning: design agents to be aware of cost implications of recursive reasoning and to limit recursion depth.

Runtime Controls

  • Monitor token consumption per agent: instrument all API calls to track tokens consumed by each agent. Log tokens in real-time.
  • Set token consumption limits: enforce hard limits on tokens per request or per agent per day. If limits are exceeded, halt the agent.
  • Implement cost alerting: alert operations team when an agent exceeds 50%, 75%, 90% of monthly token budget.
  • Use Component 10 (Kill Switch) to halt agents that exceed token limits. Require re-authorization before re-enabling.

Detection & Response

  • Monitor costs in real-time: track daily token consumption costs, compare to monthly budget run-rate. Alert if run-rate exceeds budget by >20%.
  • Conduct quarterly cost reviews: analyze token consumption by agent, identify high-cost agents, investigate whether consumption is justified.
  • Monitor for cost anomalies: detect sudden changes in per-agent token consumption, investigate root causes.
  • Establish incident response for cost overruns: identify cause (agent change, usage increase, reasoning change), determine whether to reduce functionality or increase budget.

Related Risks

Address This Risk in Your Institution

Token Economics and Cost Runaway requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing