R-DG-06 Data Governance & Integrity DAMAGE 3.3 / High

Schema Drift Blindness

Agents consume data through natural language and loosely typed APIs. Schema changes cause silent misinterpretation, not failure. Analysis degrades without anyone noticing.

The Risk

Traditional data pipelines fail loudly when schema changes. A database table has a column removed. A query that depends on that column fails with an explicit error. The error triggers an alert, an engineer investigates, and the pipeline is updated. The error is a feature: it forces awareness of the schema change.

Agents consume data through natural language or loosely typed APIs that do not enforce schema contracts. An agent is instructed to "fetch customer account balance." The agent makes an API call that returns balance information. The API's schema changes: the field changes from "balance_cents" (integer, precise to the penny) to "balance_usd" (float, rounded to nearest dollar). The agent receives the new schema and processes it without error. The agent does not know the schema changed. Downstream recipients do not know either. The loss of precision may be silent; rounding errors may propagate through calculations without anyone noticing until results diverge significantly.

Schema drift is more dangerous in natural language contexts. An agent is instructed to "summarize the customer's transaction history." The source system changes how transactions are categorized. The new categorization schema is incompatible with the old one. The agent receives transactions with the new schema and performs reasoning based on new categorizations without knowing they are different from the old ones. The agent's historical analyses and new analyses are now inconsistent.

How It Materializes

A securities firm uses agents to generate market analysis reports daily. The agents consume market data from an internal API that provides fields like "price_open," "price_high," "price_low," "price_close" (all previously delivered as precise floating-point numbers). The firm's data team changes the API to "price_open_rounded," "price_high_rounded," "price_low_rounded," "price_close_rounded" (integers, rounded to nearest dollar, reflecting a storage optimization change). They did not deprecate the old fields because backward-compatible change was not possible; they simply changed the new API response schema.

The agent continues to receive data through natural language queries ("what were the opening prices for this list of symbols"). The natural language abstraction shields the agent from explicitly knowing about the schema change. The agent receives rounded prices, performs technical analysis, and generates reports with rounded prices embedded. The reports are professionally formatted and look authoritative. Downstream traders read the reports and use them to inform decisions. The precision loss is hidden in the data transformation.

After three months of degraded accuracy in downstream trading decisions, the firm investigates. They discover the schema change and trace it back to the API modification. By that time, the agent has generated 60 reports with rounded data. The firm has made trading decisions based on degraded analysis. The firm cannot simply re-run the analysis with correct data because the agent has not maintained the input data; it has consumed it and discarded it.

DAMAGE Score Breakdown

Dimension Score Rationale
D - Detectability 3 Schema drift is often invisible because agents continue to function without error. Discovery occurs through downstream analysis degradation or audit investigation.
A - Autonomy Sensitivity 2 Occurs at all autonomy levels. Less autonomous agents may have more oversight but agent reasoning remains opaque to schema awareness.
M - Multiplicative Potential 3 Each agent consuming drifted schema propagates the drift. Drift impacts are multiplicative if multiple agents depend on the same data source.
A - Attack Surface 1 Not easily weaponized externally; primarily a structural risk. Adversary could intentionally drift schema to degrade analysis but requires API control.
G - Governance Gap 3 Data governance frameworks assume schema monitoring and change management. Natural language abstraction breaks schema visibility.
E - Enterprise Impact 2 Analysis degradation, potential for incorrect downstream decisions, but impact is typically detected within weeks and corrected. Not immediately catastrophic.
Composite DAMAGE Score 3.3 High. Requires priority attention with dedicated controls and monitoring.

Agent Impact Profile

How severity changes across the agent architecture spectrum.

Agent Type Impact How This Risk Manifests
Digital Assistant Low-Moderate Human may notice output changes, but human may attribute changes to reasoning variation rather than schema drift.
Digital Apprentice Moderate Progressive autonomy means less human review. Schema drift may persist longer without detection.
Autonomous Agent Moderate-High Agent operates independently; no human verification of data schema consistency. Drift persists until discovered through downstream impact.
Delegating Agent Moderate Agent determines which APIs to invoke. May be unaware that APIs have drifted schema. Tool calls continue with drifted data.
Agent Crew / Pipeline High Multiple agents consume schema-drifted data. Downstream agents amplify the drift impact.
Agent Mesh / Swarm High Multiple agents share data sources. Schema drift affects all agents simultaneously. Coordinated drift across entire mesh.

Regulatory Framework Mapping

Framework Coverage Citation What It Addresses What It Misses
BCBS 239 Partial Principle 4 (Data Validation) Requires validation controls for data accuracy and completeness. Does not address schema change detection in natural language data access patterns.
NIST AI RMF 1.0 Partial MAP 2.2 (Data Quality) Recommends data quality validation. Does not address schema drift detection in loosely typed API consumption.
EU AI Act Minimal Article 24 (Documentation) Requires documentation of data handling. Does not address schema change management in AI systems.
MAS AIRG Minimal Section 6.1 (Governance) General governance requirements. Does not address schema drift monitoring.
ISO 42001 Minimal Section 6.1 Information management requirements. Does not address schema change detection.
OWASP LLM Top 10 Minimal General API security General API security principles. Does not address schema drift.

Why This Matters in Regulated Industries

In trading and markets, precision matters. Rounded prices produce different technical analysis results than exact prices. Analysis based on rounded data can lead to incorrect trading signals. In banking, schema drift affecting customer account data can lead to incorrect balance calculations or exposure assessments. In insurance, schema drift in claims data can affect reserve calculations and underwriting models. The drift is often silent: the agent continues to function, outputs continue to be produced, but outputs become gradually less accurate.

Regulators increasingly expect institutions to monitor data quality, including schema changes. If an institution discovers it has been operating with drifted schema for months without detecting it, the regulator's confidence in data governance and monitoring controls is compromised. The institution should have detected the schema change through schema change management processes or through data quality monitoring that compares outputs to baselines.

Controls & Mitigations

Design-Time Controls

  • Implement strict data contracts for all APIs agents consume: define schema explicitly, version schemas, and enforce backward-incompatible changes to trigger agent redesign reviews.
  • Require agents to consume data through strongly typed schema interfaces (gRPC, Protocol Buffers, OpenAPI with strict validation) rather than natural language queries.
  • Use Component 1 (Agent Registry) to document schema versions for each agent-to-data-source relationship. Establish change notification procedures that escalate schema version changes to agent owners.
  • Establish data schema governance: maintain a schema registry with versioning, change history, and deprecation notices. Require all agent data access to declare schema version dependency.

Runtime Controls

  • Implement schema validation gates: require agents to validate incoming data against expected schema before processing. Log schema validation failures and alert operators.
  • Monitor data characteristics during runtime: implement statistical validation comparing real-time data distributions to historical baselines. Alert if distributions change significantly.
  • Require agents to preserve input data for all reasoning operations: store exact copies of data consumed in each reasoning pass. Enable reconstruction of historical inputs.
  • Use Component 10 (Kill Switch) to automatically halt agents whose output distributions diverge significantly from historical baselines without schema change notification.

Detection & Response

  • Conduct quarterly schema audits: verify schema versions for all data sources agents consume. Compare actual schemas to documented schema versions.
  • Implement schema change monitoring: establish alerting on schema changes from connected data sources. Notify agent owners of changes immediately; require acknowledgment and impact assessment.
  • Monitor agent output quality metrics: detect statistical divergence in agent outputs that could indicate schema drift. Investigate when quality metrics deteriorate without known cause.
  • Establish schema drift incident response: immediately audit data consumed since drift detection, re-analyze with correct schema, notify downstream consumers of affected analyses.

Related Risks

Address This Risk in Your Institution

Schema Drift Blindness requires architectural controls that go beyond what existing frameworks provide. Our advisory engagements are purpose-built for banks, insurers, and financial institutions subject to prudential oversight.

Schedule a Briefing