Agent reports high statistical confidence in a conclusion built on invalid premises; confidence score does not reflect premise integrity.
An agent might report a conclusion with high confidence (e.g., 95% confidence) even though the conclusion is built on invalid or unverified premises. The 95% confidence score refers to the statistical strength of the model, not the validity of the premises. This is dangerous because downstream humans or systems treat confidence scores as signals of reliability.
This is fundamentally agentic because agents are designed to output confidence scores that humans will act on. A traditional system that provides no confidence score at all is less dangerous than an agent that provides a misleading confidence score.
A bank's credit decision agent analyzes a credit application and produces: "APPROVE: applicant is creditworthy with 94% confidence."
The confidence score is based on a logistic regression model trained on historical credit data. The model has 94% accuracy on the training set, so the agent reports 94% confidence. However, the premises of the decision are invalid: the applicant's stated employment was not verified, the credit score is from a source with known stale data issues, and the debt-to-income ratio is calculated based on unverified stated debt.
None of the premises have been validated. The agent's 94% confidence refers to the model's historical accuracy, not to the validity of the premises or the reliability of the current decision.
The loan is approved and disbursed. Three months later, when the bank discovers that the applicant's income was fraudulently stated and their actual debt is much higher, the bank suffers a loss. The regulatory examination finds that the bank approved a loan based on an agent's 94% confidence recommendation, without verifying any of the premises.
| Dimension | Score | Rationale |
|---|---|---|
| D - Detectability | 4 | Premise invalidity is invisible if confidence score is treated as ground truth. |
| A - Autonomy Sensitivity | 5 | Agent produces confidence scores that drive downstream decision-making autonomously. |
| M - Multiplicative Potential | 4 | Impact scales with number of decisions made with invalid-premise confidence scores. |
| A - Attack Surface | 5 | Any agent that outputs confidence scores is vulnerable. |
| G - Governance Gap | 5 | No standard framework requires validation of premises before reporting confidence scores. |
| E - Enterprise Impact | 4 | Credit losses, regulatory findings, requirement to implement premise validation. |
| Composite DAMAGE Score | 3.6 | High. Requires priority attention and dedicated controls. |
How severity changes across the agent architecture spectrum.
| Agent Type | Impact | How This Risk Manifests |
|---|---|---|
| Digital Assistant | Low | Human reviews premises before acting on confidence score. |
| Digital Apprentice | Medium | Apprentice governance requires premise validation. |
| Autonomous Agent | Critical | Agent outputs confidence score based on model accuracy, not premise validity. |
| Delegating Agent | High | Agent invokes tools and passes along confidence scores; tool premises are not validated. |
| Agent Crew / Pipeline | Critical | Multiple agents in sequence propagate invalid-premise confidence scores. |
| Agent Mesh / Swarm | Critical | Agents share confidence scores; invalid premises propagate through mesh. |
| Framework | Coverage | Citation | What It Addresses | What It Misses |
|---|---|---|---|---|
| SR 11-7 / MRM | Addressed | Model Risk Management (Section 2) | Expects models to be validated and assumptions tested. | Does not address confidence-validity confusion in agent outputs. |
| GLBA | Partial | 16 CFR Part 314 | Requires safeguards for decision-making. | Does not specify premise validation. |
| Fair Lending Laws | Partial | Various fair lending regulations | Require validated information in credit decisions. | Do not address confidence score misuse. |
| NIST AI RMF 1.0 | Partial | MEASURE.1 | Recommends measuring model performance. | Does not distinguish model accuracy from premise validity. |
Regulators in banking and credit expect that decisions are based on verified information. When a bank approves credit based on an agent's confidence score, the regulator expects that the confidence score reflects the validity of the premises, not just the statistical accuracy of the model.
If an agent reports high confidence based on unverified premises, and a downstream human decision-maker acts on that confidence, the organization has failed in its verification controls. Under SR 11-7, this is a model governance failure that requires corrective action.
Confidence-Validity Confusion requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.
Schedule a Briefing