R-RE-10 Reasoning & Epistemic DAMAGE 3.3 / High

Reasoning Non-Reproducibility

Same agent with same inputs produces different reasoning paths and different conclusions on successive runs; reasoning cannot be reproduced for audit or challenge.

The Risk

LLM-based agents operate with temperature and sampling parameters that introduce randomness. When an agent is asked the same question twice with identical inputs, it may produce different reasoning paths and different conclusions. This randomness is useful for creativity but is problematic for regulated decision-making where decisions must be reproducible and auditable.

If a regulator or an internal auditor asks "why did you deny this customer's credit application?" and the agent is asked to reproduce its reasoning, the agent may produce a different reasoning path than it did when it originally made the decision. This non-reproducibility makes it impossible to verify whether the original decision was made correctly or to defend the decision against challenge.

This is fundamentally agentic because agent systems often rely on LLM-based text generation, which has inherent randomness. A traditional deterministic system would always produce the same reasoning for the same inputs.

How It Materializes

A bank uses an agent to make credit limit increase decisions. A customer applies for a credit limit increase. The agent analyzes the application and produces: "DENY: customer utilization is 60%, below the 80% threshold for approval; customer's credit score improved 20 points in the last 6 months, indicating positive trend, but does not yet meet the 750 minimum for approval under new customers."

The decision is denied. One month later, during a regulatory examination, the examiner asks the bank to explain the denial decision. The bank resubmits the same customer application to the agent and asks it to explain its reasoning. The agent produces: "APPROVE: customer has demonstrated credit management improvement with 20-point score increase in 6 months; recent payment history is strong; utilization at 60% suggests controlled usage; customer is eligible for increase based on trajectory of improvement."

The agent has reversed its own decision and provided a completely different reasoning path. This non-reproducibility is alarming to the regulator: it suggests that either (1) the original decision was arbitrary, or (2) the agent cannot reliably explain its own decisions.

DAMAGE Score Breakdown

DimensionScoreRationale
D - Detectability3Non-reproducibility is visible only when agent is asked to re-run on same inputs.
A - Autonomy Sensitivity4Agent produces different outputs autonomously; randomness is built into the system.
M - Multiplicative Potential3Impact depends on frequency of non-reproducibility and whether it leads to decision reversals.
A - Attack Surface5Any LLM-based agent with temperature > 0 is vulnerable to non-reproducibility.
G - Governance Gap5No standard framework requires reproducibility of agent reasoning.
E - Enterprise Impact3Regulatory concern, requirement to implement reproducibility controls, potential enforcement action if non-reproducibility masks discrimination.
Composite DAMAGE Score3.3High. Requires priority attention and dedicated controls.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent TypeImpactHow This Risk Manifests
Digital AssistantLowHuman reasoning is consistent across time; assistant outputs reproducible if grounded in human reasoning.
Digital ApprenticeMediumApprentice governance requires reproducibility; decisions are deterministic.
Autonomous AgentCriticalAgent produces non-reproducible reasoning; decisions cannot be audited consistently.
Delegating AgentHighAgent reasoning is non-reproducible; tool invocations may be different on successive runs.
Agent Crew / PipelineHighMultiple agents with non-reproducible reasoning; cumulative non-reproducibility.
Agent Mesh / SwarmCriticalPeer agents produce non-reproducible reasoning; outcomes are unpredictable.

Regulatory Framework Mapping

FrameworkCoverageCitationWhat It AddressesWhat It Misses
SR 11-7 / MRMPartialModel Validation (Section 2)Expects models to be validated and performance measured.Does not address reproducibility of reasoning.
Fair Lending LawsPartialVarious fair lending regulationsRequire that decisions are not arbitrary and can be explained.Do not anticipate non-reproducible agent reasoning.
NIST AI RMF 1.0PartialMEASURE.1Recommends measuring system performance.Does not specify reproducibility requirements.
EU AI ActPartialArticle 14 (Transparency)Requires explanations of system decisions.Does not require reproducibility of explanations.

Why This Matters in Regulated Industries

In banking and credit, regulatory examinations rely on the ability to audit decisions and understand reasoning. If a regulator can rerun a decision scenario and get a different outcome, they cannot verify that the original decision was made correctly. Non-reproducibility is a serious governance concern that undermines audit capability.

Fair lending law requires that credit decisions not be arbitrary. If an agent's reasoning is non-reproducible, it appears arbitrary, even if the underlying logic is sound. Regulators will flag non-reproducible agent reasoning as a control concern.

Controls & Mitigations

Design-Time Controls

  • Implement deterministic reasoning: set LLM temperature to 0 for decision-making agents. Deterministic reasoning is slower and less creative, but is reproducible and auditable.
  • Implement reasoning snapshots: when an agent makes a decision, log the exact reasoning path taken. When asked to reproduce the decision, return the logged reasoning rather than re-generating it.
  • Implement decision versioning: each decision should include a version identifier that points to the exact reasoning that was used.

Runtime Controls

  • Implement reproducibility checks: periodically re-run decisions on the same inputs and verify that the agent produces the same reasoning and conclusion.
  • Log temperature and sampling parameters: record the exact configuration used when the agent made each decision.
  • Flag parameter changes: if parameters are changed, flag prior decisions for re-audit.

Detection & Response

  • Audit reproducibility: periodically re-run agent decisions and verify they are reproducible. Flag non-reproducible decisions for investigation.
  • Investigate non-reproducibility causes: determine whether non-reproducibility is due to randomness, differences in input data, or changes in the model.
  • Implement decision reversal for significant non-reproducibility: if a decision produces materially different conclusions on re-run, reverse the original decision and re-make it using deterministic reasoning.

Related Risks

Address This Risk in Your Institution

Reasoning Non-Reproducibility requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing