Data Governance for AI in Financial Services

Every conversation about AI governance eventually arrives at the same place: the data. An AI system is only as trustworthy as the data it learns from, reasons over, and acts upon — and regulators across every major jurisdiction expect banks to demonstrate both.

The Foundation You Cannot Skip

Every conversation about AI governance eventually arrives at the same place: the data. An AI system is only as trustworthy as the data it learns from, reasons over, and acts upon. This sounds obvious, but the implications are profound and frequently underestimated. Regulators across every major jurisdiction — from the US Federal Reserve to the EU's AI Act to Singapore's MAS — explicitly require that institutions demonstrate not just that their AI models are well-governed but that the data feeding those models is accurate, complete, timely, representative, and properly stewarded. You cannot satisfy an AI governance requirement without first satisfying the data governance requirement that lies beneath it.

For banks and financial institutions, this is not a new obligation. Data governance has been a regulatory expectation for over a decade, anchored in frameworks like BCBS 239 (the Basel Committee's principles for risk data aggregation and reporting). What has changed is the stakes: when data quality failures affected a quarterly risk report, the consequences were serious but contained. When data quality failures affect an AI system making real-time lending decisions, fraud determinations, or customer communications at scale, the consequences multiply by orders of magnitude. The speed and autonomy of AI amplify every weakness in the data layer that lies beneath it.

The Data Landscape in a Modern Bank

Understanding why data governance is so challenging requires understanding where data actually lives in a modern financial institution. Banks typically operate across multiple data environments, each with different characteristics and different governance challenges.

Data warehouses and data marts contain curated, integrated, mastered, and actively stewarded data. This is the gold standard: data that has been validated, reconciled, and documented. It is highly trusted but also, by its nature, delayed. The curation process takes time, so warehouse data reflects yesterday's or last week's reality, not this moment's. For regulatory reporting and historical analysis, that latency is acceptable. For real-time AI decision-making, it may not be.

Data lakes hold vast quantities of raw data in its native format — transaction logs, event streams, unstructured documents, third-party feeds. This data is current but untamed. Working with it requires deep expertise and contextual understanding. A data engineer who knows which fields are reliable, which transformations are needed, and which edge cases to watch for can extract tremendous value. However, an AI system trained on raw lake data without that context can learn the wrong patterns and produce confidently wrong outputs.

The industry is increasingly moving toward a middle ground: data products. A data product is a curated, documented, fit-for-purpose dataset that is designed to be reusable across multiple consumers — whether those consumers are analysts, dashboards, or AI models. Data products apply the principles of product management to data: they have defined owners, quality standards, documentation, versioning, and service-level agreements. This approach, drawing on the data mesh architecture principles articulated by Zhamak Dehghani, treats data not as a byproduct of operational systems but as a first-class organizational asset with clear accountability and ownership.

Feature Stores, Metrics, and the Consistency Problem

For AI and machine learning specifically, the data pipeline introduces additional governance requirements that most traditional data governance frameworks were not designed to address.

Most data science work requires raw data to be transformed before it becomes useful. Features — the derived variables that machine learning models actually consume — are typically created through derivation, aggregation, or formula. A credit risk model might use "average account balance over 90 days" as a feature, derived from daily balance records. An anti-money laundering model might use "number of cross-border transactions exceeding $10,000 in the past 30 days," derived from transaction logs. These transformations must be consistent: the same feature must be computed the same way whether it is used for model training, real-time inference, regulatory reporting, or backtesting. If the training pipeline computes a feature one way and the production system computes it differently, the model's predictions in production will diverge from its validated performance. This divergence is known as training-serving skew.

Enterprise feature stores — platforms like Feast, Tecton, or capabilities built into Databricks and cloud platforms — exist to manage exactly this problem. A feature store provides a central registry of feature definitions, ensures consistent computation across training and serving environments, tracks feature lineage (where the data came from and how it was transformed), and enables monitoring for distribution shifts that might indicate data quality degradation.

The consistency problem extends beyond AI pipelines into reporting and analytics. Banks produce thousands of metrics — customer lifetime value, net interest margin, loan default rates, regulatory capital ratios — that must be defined consistently across every report, dashboard, and model that uses them. When the definition of "delinquent loan" varies between the risk team's model and the finance team's report, the institution cannot trust either output. Semantic layers and metrics platforms (such as dbt's semantic layer) address this by centralizing metric definitions so that every consumer — human or algorithmic — works from the same source of truth.

Signals, Telemetry, and the Monitoring Imperative

AI systems produce outputs. In banking, those outputs are often signals: a fraud score, a credit risk rating, a recommended action, a customer segment classification. These signals drive decisions that affect real people and real money. Governing the model is necessary but not sufficient. The institution must also capture and store every signal the model produces, along with the parameters, input data, and contextual metadata that produced it. This is model telemetry, and without it, the institution cannot answer the most basic governance questions: what did the model decide, why, and what basis did it use?

Telemetry is the foundation for monitoring and drift detection. Consider a loan approval process where the model has historically approved 78% of applications. If the approval rate drops to 52% over a two-week period, the institution needs to diagnose why. Was it because a specific customer segment is now failing? Is required input data missing due to an upstream system failure? Has the model drifted, so its learned patterns no longer match reality? Does the model need retraining because the economic environment has shifted? Is it fraud — someone manipulating inputs to trigger denials? Or has the applicant cohort itself changed (perhaps a marketing campaign attracted a different demographic)?

Each of these causes requires a different response, and distinguishing between them requires the telemetry to exist in the first place. Without captured signals, parameters, and input snapshots, the institution is guessing. With them, it can systematically isolate the cause and respond appropriately. This is the operational reality that regulators are increasingly expecting: not just that you built the model correctly but that you can demonstrate ongoing awareness of how it is performing and what is driving that performance.

You cannot satisfy an AI governance requirement without first satisfying the data governance requirement that lies beneath it.

The Drift Problem: Data, Model, Concept, and Epistemic

Drift is the gradual degradation of an AI system's reliability over time, and it comes in several forms that banks must understand and monitor for.

Data drift occurs when the statistical distribution of input data changes from what the model was trained on. If a credit model was trained on applications from 2019–2023 and the applicant pool shifts significantly in 2026 — different income distributions, different employment patterns, different geographic concentrations — the model's predictions become less reliable even though the model itself has not changed. The world moved; the model did not.

Concept drift is subtler and more dangerous. It occurs when the underlying relationship between inputs and outcomes changes. Fraud patterns are a classic example: criminals adapt their techniques, so the patterns that indicated fraud last year may no longer apply. During COVID-19, the relationship between employment status and credit default changed fundamentally — government support programs meant that unemployed borrowers were not defaulting at historical rates. Models trained on pre-pandemic data produced systematically wrong predictions because the economic logic they had learned no longer held.

Model drift is the cumulative effect: the model's overall performance degrades as data drift and concept drift compound. Standard monitoring can detect this through accuracy metrics, precision/recall tracking, and population stability indices.

Epistemic drift is the most insidious form and the hardest to detect. It occurs when the knowledge and assumptions embedded in the AI system gradually diverge from reality, not because the model's mathematics are wrong but because the premises it reasons from are no longer valid. An AI system can execute flawless logic on stale, incomplete, or contextually obsolete information and produce confidently wrong conclusions. Standard performance metrics will not catch this because the model is performing correctly given its inputs, though the inputs themselves have become unreliable. Detecting epistemic drift requires monitoring the validity of the premises, not just the quality of the outputs. This represents a fundamentally different kind of oversight that most monitoring frameworks do not yet address.

What Regulators Expect

Across jurisdictions, the regulatory message is converging: data governance is not optional infrastructure that supports AI governance from below but a co-equal requirement.

SR 11-7, the foundational US model risk management guidance from the Federal Reserve and OCC, explicitly requires that institutions govern data quality as part of the model lifecycle. Data inputs must be appropriate for the model's purpose, accurately processed, and subject to ongoing quality monitoring. The guidance was written in 2011 for traditional statistical models, but regulators have made clear it applies with equal or greater force to AI systems.

BCBS 239, issued by the Basel Committee in 2013, established 14 principles for risk data aggregation and reporting that remain the global benchmark for banking data governance. These principles require accuracy, completeness, timeliness, and adaptability of risk data — precisely the qualities that AI systems demand. Notably, a 2020 Basel Committee progress report found that none of the 25 assessed banks were fully compliant with BCBS 239, even after seven years. Data governance in banking remains a work in progress, and AI is raising the bar further.

The EU AI Act, Article 10, requires that training, validation, and testing datasets for high-risk AI systems be relevant, sufficiently representative, free of errors, and complete. It also requires documented data governance practices covering design choices, data collection processes, data preparation operations, and bias examination along with mitigation measures.

MAS AIRG in Singapore requires that AI data be representative, appropriately protected, and subject to governance standards that are "uplifted" beyond general data governance requirements to address AI-specific risks including bias, representativeness, and temporal validity.

The practical implication is clear: an institution that cannot demonstrate robust data governance will not be able to satisfy AI governance requirements regardless of how sophisticated its model governance framework appears on paper.

The Business Case: Data Quality Pays for Itself

The cost of poor data quality in financial services is well-documented. McKinsey research indicates that banks save 30–40% in data management costs through disciplined data lifecycle management, and that institutions leveraging high-quality data generate 5–6% more revenue and are 20% more profitable than peers. Gartner estimates the global data governance market reached $4.4 billion in 2024 and will grow to $18 billion by 2032, driven primarily by AI adoption and regulatory pressure. Yet Gartner also predicts that 80% of data governance initiatives will fail by 2027 due to lack of clear strategic positioning. This is a warning that investment alone is not sufficient without the right architecture and accountability.

The Zillow iBuying failure is perhaps the most instructive cautionary tale. Zillow's home-buying algorithm lost over $400 million because it relied on structured property data — square footage, bedroom count — while failing to account for unstructured and contextual factors: neighborhood dynamics, local market sentiment, property condition. The model was mathematically sound. The data was incomplete. The losses were real. For a bank making lending decisions at scale, the parallel is direct: a credit model trained on incomplete or unrepresentative data can produce systematically biased outcomes that trigger regulatory enforcement, customer harm, and financial loss all at once.

How Corvair Helps

Corvair's architecture-first approach treats data governance as a foundational layer of AI governance, not an afterthought. We help institutions map data lineage from source through transformation to model input, design monitoring frameworks that detect all four forms of drift, and build the telemetry infrastructure that turns model outputs from opaque signals into auditable, explainable decisions.

Our methodology directly addresses the regulatory expectation that institutions demonstrate not just model quality but data quality — because regulators understand, as banks must, that the two are inseparable.

Schedule a Briefing

Related Regulations

Why Governance Is a Competitive Advantage

The investment case for AI governance infrastructure — and why the organisations that build it first outperform their peers.

Read guide

MAS AIRG

Singapore's Artificial Intelligence Risk Guidelines — the most advanced banking AI governance framework yet finalized by a regulator.

Read guide

NIST AI RMF

The US National Institute of Standards and Technology's AI Risk Management Framework — a voluntary but influential standard for AI risk governance.

Read guide