Beyond "shift left": How AI-powered data agents are transforming financial services data quality
Financial institutions still struggle with data quality, despite double-digit growth in data spending over the past few years (according to Forrester and Gartner studies). The insatiable appetite for data has driven incredible growth in cloud and lakehouse adoption, yet data maturity improvements remain flat or only slightly improved.
Why can't we get data quality right?
The popular "shift left" philosophy emphasizes ensuring data quality at the point of entry. While this makes intuitive sense, Capital One and Netflix studies suggest this approach is insufficient. Netflix's research concluded that 70% of data quality issues occur at the integration stage – typically at the edges of the data ecosystem, where hundreds of regulatory and business reporting teams extract and combine data for executive decisions and regulatory oversight.
The problem is that these "last mile" teams may lack a clear understanding of domain data and may improperly use or combine it. Not only that, but there's significant variance in their approaches to data quality, from ad-hoc methods to highly structured frameworks. This variance makes it challenging to create structured feedback loops to upstream providers.
Even where data quality systems exist, they typically involve significant human-in-the-loop processes filled with subjective content. The resulting processes are slow and lack rigorous structure, and because they lack a common data language (the semantic layer), investigation and resolution become painfully laborious.
How AI can transform data quality observation
The persistent challenges in data quality demand more than incremental improvements. Manual validation approaches have reached their limits, creating a critical need for a new method of understanding data interactions. Generative AI (GenAI) emerges as a valuable solution that goes beyond traditional validation rules, offering unique capabilities to infer relationships and uncover patterns that weren't obvious to humans.
Traditional validation rules (like "data must be a number between 1-100") often miss the more subtle problems that signal deeper issues. GenAI detects nuanced patterns that traditional methods overlook:
Gradual drift in values over time, like declining net interest margins in a retail banking portfolio despite interest rate stability.
Sudden changes in data distribution, such as customer transaction patterns abruptly shifting after rate structure changes.
Unusual combinations of valid values that might indicate process breakdowns.
Unexpected gaps or clusters in time-series data, like high-dollar claims clustering just outside standard review windows.
AI-powered egress detection: The missing piece in financial data quality
These nuanced patterns expose a fundamental limitation in traditional data validation: The inability to capture complex, cross-system interactions. The most critical insights emerge not at data entry, but at the "egress point" where information flows to decision-makers. This is where the true complexity of data interactions becomes visible, revealing why point-in-time validation falls short.
Most financial organizations focus on catching data anomalies at ingestion or during processing, but this approach misses crucial insights. When anomalous data reaches board reports, regulatory filings, or executive dashboards, the consequences are immediate and significant.
The egress point provides rich context from all upstream data flows, enabling detection of subtle cross-system anomalies that would be invisible earlier. For instance, individual teams might flag a spike in mortgage prepayments as suspicious, but an anomaly detection system recognizes that spike as legitimate by correlating it with interest rate drops and the bank's recent refinancing promotion campaign.
This transforms anomaly detection from a mere data quality tool into a strategic business capability that validates your institution's narrative across all data domains.
AI guardrails for smarter data integration
The insights gained from egress point analysis reveal a critical opportunity: Enabling entirely new types of data validation that transcend traditional constraint-based checking.
Modern data composition tools allow for dynamic, context-aware validation that detects subtle patterns, cross-system inconsistencies, and complex relationship anomalies that were previously undetectable.
Modern low-data-movement tools like Trino (formerly Presto) and Hasura Data Delivery Network (DDN) have dramatically increased our data composition capabilities. These technologies enable organizations to query and combine data across multiple sources without large-scale data movement, reducing latency and infrastructure complexity. Tools like Trino allow federated queries across disparate data sources – from data lakes to relational databases – while Hasura DDN provides a unified data access layer that dynamically composes data from different domains.
While this power is often viewed with trepidation, it overlooks a crucial truth that data composition is both inevitable and potentially revealing. When teams combine data in novel ways using these advanced tools, they often uncover hidden insights and potential quality issues that might otherwise go undetected.
The key lies in building adaptive systems where AI-powered anomaly detection serves as a continuous feedback mechanism. These systems can identify several types of issues, particularly relevant to financial services:
Semantic inconsistencies when data from different domains is combined
Temporal anomalies that expose synchronization issues in time-series data
Business rule violations that weren't visible when data was considered separately
Data quality issues that single-system validation missed
Leveraging LLMs and GenAI for financial data intelligence
Adaptive data composition strategies reveal a deeper potential by transforming data into a more dynamic, intelligence-driven resource. As these techniques mature, artificial intelligence emerges as the critical technology to unlock this potential. Large language models (LLMs) and GenAI offer significant capabilities for understanding complex data relationships, but they require sophisticated guidance to realize their full potential.
Data access layer metadata serves as a natural graph structure to guide AI systems in making appropriate data combinations. Imagine an enterprise LLM system evolving beyond simple data validation, learning to understand the natural connections between different parts of your business – like how customer transaction data and product inventory interrelate through complex business logic and patterns.
Building AI-enhanced feedback loops in financial data systems
The sophisticated data interpretation capabilities of AI demand a fundamental redesign of how organizations understand and use data. Traditional systems approach data quality as a linear process, but the reality revealed by AI is far more complex and interconnected, requiring a complete reimagining of feedback mechanisms.
Rather than simply applying AI to existing broken processes, we must rethink data quality from first principles. Effective systems are designed with clear feedback mechanisms that optimize for the right signals while minimizing noise. While human involvement remains necessary, it often introduces more complexity and slows down cycle times.
For financial services, AI becomes more than an automation tool. It offers a way to:
Maintain signal quality
Increase feedback loop speed
Provide more nuanced insights
Potentially replace manual interventions in specific scenarios
Organizational roles in an AI-augmented data ecosystem
Traditional organizational structures are no longer sufficient for managing increasingly complex data ecosystems. Effective data management now requires a collaborative approach that breaks down long-standing silos and creates more adaptive, intelligence-driven team structures.
Domain data teams: The foundation
Domain data teams serve as primary stewards of specific data areas like mortgage portfolios, credit card transaction data, or investment holdings. They establish baseline expectations for normal behavior and provide crucial context when anomalies are detected.
Federated data teams: The connectors
Federated data teams manage the cross-system infrastructure that makes modern data composition possible. They maintain technical foundations for anomaly detection and spot patterns that might be invisible when examining individual domains in isolation.
Cross-domain data teams: The AI innovators
In financial institutions, risk management teams serve as a crucial cross-domain function, combining data from multiple systems to monitor risk exposure. Financial control teams bridge multiple domains by reconciling data across trading, settlement, and accounting systems.
GenAI initiatives represent a particularly demanding type of cross-domain work, integrating data from across the enterprise while maintaining extremely high quality standards.
Intelligent stakeholder engagement through conversational AI
The breakdown of traditional organizational data structures points to a more profound transformation in how complex data insights become accessible to everyone. Conversational AI interfaces represent the next critical evolution, turning data quality from a technical challenge into a collaborative, intuitive dialogue that transcends traditional barriers.
Consider these transformative scenarios:
A risk manager asks the AI: "Why are we seeing higher default rates in the Northwest region?" The system correlates data with economic indicators, recent policy changes, and historical patterns.
A compliance officer requests: "Show me all potential GDPR issues in our customer data from the last quarter." The AI agent identifies compliance gaps and drafts resolution plans.
An executive challenges a metric: "This revenue growth doesn't align with market trends." The AI agent traces data lineage and suggests validation methods.
This conversational approach democratizes data quality management and accelerates issue resolution. It's not just about detecting anomalies – it's about making those insights accessible and actionable for everyone from technical data stewards to executive stakeholders, all speaking the universal language of conversation.
From concept to reality: AI data agents in action
The prototype demonstrates a conversational data quality system that goes beyond traditional dashboards. By interacting naturally with complex financial datasets, we show how AI can:
Ask and answer nuanced questions about data anomalies
Provide contextual explanations of quality issues
Collaboratively develop resolution strategies
What's particularly powerful is the system's ability to connect previously siloed data domains, providing context invisible in individual systems.
Conclusion: The AI-powered data quality future
As financial data ecosystems become increasingly complex, AI-powered anomaly detection must continually evolve. The ability to detect meaningful patterns while filtering out spurious correlations will distinguish leading institutions.
Data quality professionals must develop new skills at the intersection of data science, machine learning, and business strategy. The path forward lies in building systems that combine human expertise with machine intelligence, creating virtuous cycles of continuous improvement.
Financial institutions that master this delicate balance – maintaining robust controls while enabling innovative data composition – will be best positioned to thrive in an increasingly data-driven future.