The DAMAGE Scoring Framework

Six dimensions calibrated for highly-regulated industries, where the baseline consequence of agent failure includes examination findings, enforcement actions, and capital impacts.

Existing AI risk frameworks were not built for agentic systems operating inside regulated institutions. Frameworks like NIST AI RMF and ISO 42001 provide valuable structure for general AI governance, but they do not score risks along the dimensions that matter most when autonomous agents operate within prudential supervision regimes. They were designed before agents could accumulate authority at runtime, cascade failures across organizational boundaries, or create governance gaps that no existing standard addresses.

The DAMAGE framework was developed to fill this gap. Each of the six dimensions captures a property of agentic AI risk that is directly relevant to regulated financial institutions: how visible the risk is before it materializes, how it behaves as autonomy increases, whether it compounds across systems, how exposed it is to adversarial exploitation, whether current governance addresses it, and what the worst-case regulatory consequence looks like. These are the questions that chief risk officers, model risk managers, and regulators need answered.

The composite DAMAGE score is the average of all six dimensions, each scored from 1 (low risk) to 5 (high risk). This produces a single comparable metric across all 133 risks in the catalog, enabling institutions to prioritize control investments, benchmark their exposure, and communicate risk posture to boards and supervisors in a consistent vocabulary.

Detectability

How difficult is this risk to detect before harm occurs?

Detectability measures the gap between when a risk becomes active and when existing monitoring and alerting systems can identify it. In agentic systems, many risks are structurally invisible to standard observability tooling because they emerge from the interaction between agents, tools, and accumulated context rather than from any single observable event.

1 Low

Real-time alerting exists; standard monitoring catches it.

3 Medium

Detectable with dedicated monitoring but not standard tooling.

5 High

Invisible until post-incident forensics; passes all standard metrics.

Why this matters in regulated industries: Supervisors expect institutions to demonstrate that material risks are identified and reported in a timely manner. A risk that is invisible until post-incident forensics means the institution may already be in breach of its reporting obligations before it even knows the event occurred. Under MAS Notice on Technology Risk Management and similar regimes, delayed detection can itself become an examination finding.

Autonomy Sensitivity

Does this risk worsen as agent autonomy increases?

Autonomy Sensitivity captures how a risk behaves as institutions move from human-in-the-loop to human-on-the-loop to fully autonomous agent operations. Some risks remain constant regardless of the autonomy level; others scale exponentially, remaining manageable at low autonomy but becoming catastrophic when agents operate with minimal human oversight.

1 Low

Risk is constant regardless of autonomy level.

3 Medium

Risk increases linearly with autonomy.

5 High

Risk scales exponentially with autonomy; minimal at low autonomy, catastrophic at high.

Why this matters in regulated industries: Financial institutions are under competitive pressure to increase agent autonomy for speed and efficiency. Autonomy-sensitive risks create a hidden cliff: operations that appear safe during pilot programs can become uncontrollable when deployed at production scale with reduced human oversight. Regulators increasingly require institutions to demonstrate they understand how risk profiles change as automation levels increase.

Multiplicative Potential

Does this risk compound with other risks or across agents?

Multiplicative Potential measures whether a risk stays contained or propagates. In multi-agent architectures, a single point of failure can trigger cascading effects that cross process boundaries, organizational functions, and even institutional borders. This dimension distinguishes between risks that fail gracefully and risks that fail catastrophically.

1 Low

Isolated; affects single process or task.

3 Medium

Affects adjacent processes; moderate propagation.

5 High

Cascading; triggers chain reaction across agents, systems, and organizational boundaries.

Why this matters in regulated industries: Operational resilience frameworks like DORA and the Basel Committee's Principles for Operational Resilience specifically require institutions to map interdependencies and prevent cascading failures. A risk with high multiplicative potential threatens not just the originating process but the institution's ability to maintain critical business services, which is the core concern of operational resilience regulation.

Attack Surface

How exposed is this risk to adversarial exploitation?

Attack Surface measures how accessible a risk is to intentional exploitation by adversaries. Agentic AI systems introduce novel attack vectors that did not exist in traditional software: prompt injection, tool-use manipulation, inter-agent deception, and context poisoning. This dimension assesses whether exploitation requires insider access or can be achieved remotely through public-facing interfaces.

1 Low

Requires physical or internal access; low exploitability.

3 Medium

Exploitable through standard internal interfaces.

5 High

Remotely exploitable via public-facing agent interfaces or data inputs.

Why this matters in regulated industries: Financial institutions are high-value targets for sophisticated adversaries. Regulators require robust cyber risk management programs and penetration testing regimes. A risk that is remotely exploitable through public-facing agent interfaces represents a fundamentally different threat profile than one requiring insider access. Under frameworks like the ECB's TIBER-EU and the Hong Kong Monetary Authority's iCAST, institutions must demonstrate they understand and test against their actual attack surface.

Governance Gap

How well do current frameworks address this risk?

Governance Gap measures the maturity of existing controls and regulatory guidance for a given risk. Some agentic AI risks fall neatly within established frameworks like NIST, ISO 42001, the EU AI Act, and MAS AIRG. Others are so novel that no production framework addresses them at all, requiring institutions to develop governance from first principles.

1 Low

Mature controls exist in NIST, ISO, EU AI Act, MAS AIRG.

3 Medium

Partially addressed by one or two frameworks; significant gaps remain.

5 High

No production framework addresses this risk; novel governance required.

Why this matters in regulated industries: Institutions operating under prudential supervision cannot simply acknowledge a risk and move on. They must demonstrate to examiners that appropriate controls are in place. When no established framework addresses a risk, the institution bears the full burden of designing, implementing, and defending its own governance approach. These are the risks most likely to generate examination findings, because supervisors have no benchmark for what "adequate" looks like and institutions have no playbook to follow.

Enterprise Impact

What is the maximum blast radius if materialized in a regulated institution?

Enterprise Impact measures the worst-case consequence of a risk event, calibrated specifically for institutions under prudential oversight. Unlike generic impact scales, this dimension is anchored to the outcomes that regulated institutions actually face: examination findings, enforcement actions, consent orders, capital adequacy impacts, and license risk. A score of 5 on this dimension means the risk, if materialized, could threaten the institution's operating authority.

1 Low

Single task failure; contained to one process; no regulatory exposure.

3 Medium

Department-level impact; multiple processes affected; potential examination finding.

5 High

Institution-wide consequences: enforcement action, consent order, capital adequacy impact, license risk.

Why this matters in regulated industries: Generic risk frameworks often top out at "significant financial loss" or "reputational damage." For regulated institutions, the consequences escalate far beyond that. An enforcement action can restrict business activities. A consent order imposes ongoing supervisory requirements. Capital adequacy impacts affect the institution's ability to lend and operate. License risk is existential. This dimension ensures that the scoring framework reflects the actual stakes that boards and senior management face.

Risk Priority Tiers

The composite DAMAGE score maps each risk to a priority tier that determines the required governance response. Of the 133 risks in the catalog, the distribution reflects the reality that agentic AI in regulated industries is overwhelmingly a high-risk proposition.

Critical

4.0 - 5.0

~40

Risks

Requires immediate architectural controls; cannot be accepted.

High

3.0 - 3.9

~75

Risks

Requires governance framework, monitoring, and mitigation plan.

Moderate

2.0 - 2.9

~18

Risks

Addressable with standard controls and periodic review.

Low

1.0 - 1.9

Risks

Manageable with existing practices.

Over 85% of risks in the catalog score as High or Critical. No risk in the catalog scores below Moderate. This distribution is not an artifact of conservative scoring; it reflects the structural reality that agentic AI introduces fundamentally new risk properties into environments with zero tolerance for uncontrolled failure.

How DAMAGE Differs

Generic AI risk scoring systems were designed for different problems. OWASP scores security vulnerabilities in web applications. NIST AI RMF provides a broad governance structure but does not produce comparable risk scores. ISO 42001 establishes management system requirements without prescribing how to rank individual risks. None of these frameworks account for the properties that make agentic AI uniquely dangerous in regulated settings: authority accumulation, multi-agent cascading, autonomous decision-making at scale, and the governance vacuum around novel agent behaviors.

DAMAGE is calibrated specifically for prudential supervision. The scoring anchors are drawn from the regulatory consequences that financial institutions actually face. A score of 5 on Enterprise Impact does not mean "high financial loss" in the abstract; it means enforcement action, consent order, and license risk. A score of 5 on Governance Gap does not mean "needs further research"; it means no production framework addresses this risk and the institution must build governance from scratch while regulators watch. This specificity makes DAMAGE scores actionable for the people who need them most: CROs, model risk managers, compliance officers, and the boards they report to.

Attribute	Generic AI Risk Scoring	DAMAGE Framework
Designed for	Enterprise AI broadly	Agentic AI in regulated institutions
Impact anchors	Financial loss, reputational damage	Examination findings, enforcement actions, capital impacts, license risk
Autonomy consideration	Not scored separately	Dedicated dimension (Autonomy Sensitivity)
Cascading risk	Often treated as secondary effect	Dedicated dimension (Multiplicative Potential)
Governance maturity	Assumed or not scored	Dedicated dimension (Governance Gap)
Comparable scores	Often qualitative (Low/Med/High)	Quantitative 1-5 composite across all risks

Put the Framework to Work

Explore the full catalog of 133 scored risks, or schedule a briefing to discuss how the DAMAGE framework applies to your institution's agentic AI program.

shieldExplore the Risk Catalog event_availableSchedule a Briefing