R-RE-09 Reasoning & Epistemic DAMAGE 3.6 / High

Confidence-Validity Confusion

Agent reports high statistical confidence in a conclusion built on invalid premises; confidence score does not reflect premise integrity.

The Risk

An agent might report a conclusion with high confidence (e.g., 95% confidence) even though the conclusion is built on invalid or unverified premises. The 95% confidence score refers to the statistical strength of the model, not the validity of the premises. This is dangerous because downstream humans or systems treat confidence scores as signals of reliability.

This is fundamentally agentic because agents are designed to output confidence scores that humans will act on. A traditional system that provides no confidence score at all is less dangerous than an agent that provides a misleading confidence score.

How It Materializes

A bank's credit decision agent analyzes a credit application and produces: "APPROVE: applicant is creditworthy with 94% confidence."

The confidence score is based on a logistic regression model trained on historical credit data. The model has 94% accuracy on the training set, so the agent reports 94% confidence. However, the premises of the decision are invalid: the applicant's stated employment was not verified, the credit score is from a source with known stale data issues, and the debt-to-income ratio is calculated based on unverified stated debt.

None of the premises have been validated. The agent's 94% confidence refers to the model's historical accuracy, not to the validity of the premises or the reliability of the current decision.

The loan is approved and disbursed. Three months later, when the bank discovers that the applicant's income was fraudulently stated and their actual debt is much higher, the bank suffers a loss. The regulatory examination finds that the bank approved a loan based on an agent's 94% confidence recommendation, without verifying any of the premises.

DAMAGE Score Breakdown

DimensionScoreRationale
D - Detectability4Premise invalidity is invisible if confidence score is treated as ground truth.
A - Autonomy Sensitivity5Agent produces confidence scores that drive downstream decision-making autonomously.
M - Multiplicative Potential4Impact scales with number of decisions made with invalid-premise confidence scores.
A - Attack Surface5Any agent that outputs confidence scores is vulnerable.
G - Governance Gap5No standard framework requires validation of premises before reporting confidence scores.
E - Enterprise Impact4Credit losses, regulatory findings, requirement to implement premise validation.
Composite DAMAGE Score3.6High. Requires priority attention and dedicated controls.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent TypeImpactHow This Risk Manifests
Digital AssistantLowHuman reviews premises before acting on confidence score.
Digital ApprenticeMediumApprentice governance requires premise validation.
Autonomous AgentCriticalAgent outputs confidence score based on model accuracy, not premise validity.
Delegating AgentHighAgent invokes tools and passes along confidence scores; tool premises are not validated.
Agent Crew / PipelineCriticalMultiple agents in sequence propagate invalid-premise confidence scores.
Agent Mesh / SwarmCriticalAgents share confidence scores; invalid premises propagate through mesh.

Regulatory Framework Mapping

FrameworkCoverageCitationWhat It AddressesWhat It Misses
SR 11-7 / MRMAddressedModel Risk Management (Section 2)Expects models to be validated and assumptions tested.Does not address confidence-validity confusion in agent outputs.
GLBAPartial16 CFR Part 314Requires safeguards for decision-making.Does not specify premise validation.
Fair Lending LawsPartialVarious fair lending regulationsRequire validated information in credit decisions.Do not address confidence score misuse.
NIST AI RMF 1.0PartialMEASURE.1Recommends measuring model performance.Does not distinguish model accuracy from premise validity.

Why This Matters in Regulated Industries

Regulators in banking and credit expect that decisions are based on verified information. When a bank approves credit based on an agent's confidence score, the regulator expects that the confidence score reflects the validity of the premises, not just the statistical accuracy of the model.

If an agent reports high confidence based on unverified premises, and a downstream human decision-maker acts on that confidence, the organization has failed in its verification controls. Under SR 11-7, this is a model governance failure that requires corrective action.

Controls & Mitigations

Design-Time Controls

  • Distinguish between model accuracy and premise validity: ensure that agents distinguish between "how accurate is the model?" and "how valid are the premises this decision is based on?"
  • Implement premise validation before decision: require that critical premises are validated before the agent produces a final decision.
  • Implement confidence scoring that accounts for premise validity: design confidence scores to incorporate both model accuracy and premise validity.

Runtime Controls

  • Log premise validity status: for each premise the agent relies on, log whether the premise is verified or assumed.
  • Monitor for invalid-premise high-confidence scores: detect when agents produce high-confidence scores based on unverified premises. Alert for compliance review.
  • Implement premise validation gating: before an agent produces a final decision, implement a check that all critical premises are verified.

Detection & Response

  • Audit premise validity: periodically sample agent decisions and verify that the premises they were based on are actually valid.
  • Investigate confidence-validity divergence: if agents are consistently producing high confidence scores based on invalid premises, investigate the agent's design.
  • Implement decision reversal for invalid premises: if decisions are found to be based on invalid premises, reverse them and re-make them using only verified premises.

Related Risks

Address This Risk in Your Institution

Confidence-Validity Confusion requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing