R-RE-09 psychology Reasoning & Epistemic DAMAGE 3.6 / High

Confidence-Validity Confusion

Agent reports high statistical confidence in a conclusion built on invalid premises; confidence score does not reflect premise integrity.

The Risk

An agent might report a conclusion with high confidence (e.g., 95% confidence) even though the conclusion is built on invalid or unverified premises. The 95% confidence score refers to the statistical strength of the model, not the validity of the premises. This is dangerous because downstream humans or systems treat confidence scores as signals of reliability.

This is fundamentally agentic because agents are designed to output confidence scores that humans will act on. A traditional system that provides no confidence score at all is less dangerous than an agent that provides a misleading confidence score.

How It Materializes

A bank's credit decision agent analyzes a credit application and produces: "APPROVE: applicant is creditworthy with 94% confidence."

The confidence score is based on a logistic regression model trained on historical credit data. The model has 94% accuracy on the training set, so the agent reports 94% confidence. However, the premises of the decision are invalid: the applicant's stated employment was not verified, the credit score is from a source with known stale data issues, and the debt-to-income ratio is calculated based on unverified stated debt.

None of the premises have been validated. The agent's 94% confidence refers to the model's historical accuracy, not to the validity of the premises or the reliability of the current decision.

The loan is approved and disbursed. Three months later, when the bank discovers that the applicant's income was fraudulently stated and their actual debt is much higher, the bank suffers a loss. The regulatory examination finds that the bank approved a loan based on an agent's 94% confidence recommendation, without verifying any of the premises.

DAMAGE Score Breakdown

Dimension	Score	Rationale
D - Detectability	4	Premise invalidity is invisible if confidence score is treated as ground truth.
A - Autonomy Sensitivity	5	Agent produces confidence scores that drive downstream decision-making autonomously.
M - Multiplicative Potential	4	Impact scales with number of decisions made with invalid-premise confidence scores.
A - Attack Surface	5	Any agent that outputs confidence scores is vulnerable.
G - Governance Gap	5	No standard framework requires validation of premises before reporting confidence scores.
E - Enterprise Impact	4	Credit losses, regulatory findings, requirement to implement premise validation.
Composite DAMAGE Score	3.6	High. Requires priority attention and dedicated controls.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type	Impact	How This Risk Manifests
manage_accounts Digital Assistant	Low	Human reviews premises before acting on confidence score.
school Digital Apprentice	Medium	Apprentice governance requires premise validation.
smart_toy Autonomous Agent	Critical	Agent outputs confidence score based on model accuracy, not premise validity.
share Delegating Agent	High	Agent invokes tools and passes along confidence scores; tool premises are not validated.
groups Agent Crew / Pipeline	Critical	Multiple agents in sequence propagate invalid-premise confidence scores.
account_tree Agent Mesh / Swarm	Critical	Agents share confidence scores; invalid premises propagate through mesh.

Regulatory Framework Mapping

Framework	Coverage	Citation	What It Addresses	What It Misses
SR 11-7 / MRM	Addressed	Model Risk Management (Section 2)	Expects models to be validated and assumptions tested.	Does not address confidence-validity confusion in agent outputs.
GLBA	Partial	16 CFR Part 314	Requires safeguards for decision-making.	Does not specify premise validation.
Fair Lending Laws	Partial	Various fair lending regulations	Require validated information in credit decisions.	Do not address confidence score misuse.
NIST AI RMF 1.0	Partial	MEASURE.1	Recommends measuring model performance.	Does not distinguish model accuracy from premise validity.

Why This Matters in Regulated Industries

Regulators in banking and credit expect that decisions are based on verified information. When a bank approves credit based on an agent's confidence score, the regulator expects that the confidence score reflects the validity of the premises, not just the statistical accuracy of the model.

If an agent reports high confidence based on unverified premises, and a downstream human decision-maker acts on that confidence, the organization has failed in its verification controls. Under SR 11-7, this is a model governance failure that requires corrective action.

Controls & Mitigations

architectureDesign-Time Controls

Distinguish between model accuracy and premise validity: ensure that agents distinguish between "how accurate is the model?" and "how valid are the premises this decision is based on?"
Implement premise validation before decision: require that critical premises are validated before the agent produces a final decision.
Implement confidence scoring that accounts for premise validity: design confidence scores to incorporate both model accuracy and premise validity.

play_circleRuntime Controls

Log premise validity status: for each premise the agent relies on, log whether the premise is verified or assumed.
Monitor for invalid-premise high-confidence scores: detect when agents produce high-confidence scores based on unverified premises. Alert for compliance review.
Implement premise validation gating: before an agent produces a final decision, implement a check that all critical premises are verified.

monitoringDetection & Response

Audit premise validity: periodically sample agent decisions and verify that the premises they were based on are actually valid.
Investigate confidence-validity divergence: if agents are consistently producing high confidence scores based on invalid premises, investigate the agent's design.
Implement decision reversal for invalid premises: if decisions are found to be based on invalid premises, reverse them and re-make them using only verified premises.

Related Risks

Address This Risk in Your Institution

Confidence-Validity Confusion requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing

Agentic AI Risk & Controls Workshop Our Methodology Regulatory Landscape