Validating AI models

August 2025

Why AI governance can’t be an afterthought

AI is reshaping organisations; from the ways firms engage with customers to how they assess risk and make decisions. AI assistants, copilots and chatbots are being embedded across processes, workflows and lifecycles to automate decisions, improve customer experiences and reduce operational friction.

However, whilst firms are eager to harness the benefits of AI, AI governance and control is lagging adoption. Given the multi-faceted complexity of AI risks, effective governance, risk management and validation must be at the heart of the evolution of traditional model governance to meet new AI challenges and risks – particularly issues such as bias, lack of explainability and model hallucinations, which traditional risk frameworks do not consider.

AI governance should balance innovation with oversight using a proportionate and risk-based approach, as not every AI deployment carries the same risk. Applications used for research and innovation (e.g. ChatGPT to explore trends) may not require rigorous validation. However, once AI tools start to influence production decisions such as loan approvals, calculating affordability or setting credit thresholds, a greater level of control and thorough end to end validation becomes essential.

This paper presents a practical framework for validating AI models in decision making, particularly where customer and regulatory impacts are likely to be significant.

AI model validation is business-critical

Model validation is not just a regulatory requirement. When done right it supports:

Customer fairness and trust through transparent decision making.
Stronger business outcomes by reducing misclassification and defaults, and supporting growth.
Audit readiness through traceable, explainable decisions.
Regulatory trust by satisfying PRA, FCA, GDPR and global AI principles.
Risk mitigation by catching data drift, bias or model failure early.

UK banking use case: AI-led lending journey

Consider the example of a UK bank implementing an AI enabled digital assistant to support the lending journey. The AI system enhances customer and business outcomes across key stages.

Scenario: a customer applies for a loan using an AI enabled digital assistant. AI supports the journey through five stages:

Why it matters: each stage introduces risk from inaccurate data to unclear model logic. Strong validation ensures each AI component performs as expected.

AI-led lending journey – model validation

Data quality and integrity

AI models are only as good as the data on which they rely. Robust and resilient practices should be implemented to safeguard AI solutions in terms of both data quality and security. Unlike traditional systems, AI models ingest and act on data in real time, often collected via an automated assistant which heightens the risk of inaccurate inputs leading to flawed decisions. This increases the risk velocity and impact, potentially resulting in things going wrong quicker and having a greater impact.

In lending, poor data quality such as unverified self-employed income, outdated KYC records or missing liabilities can lead to inaccurate risk classification. Whilst these risks are not new, if AI is processing multiple applications and making errors, the size and scale of an issue could quickly become significant. Moreover, AI systems can make and propagate decisions autonomously, increasing the requirement for robust data governance.

An AI enhanced lending platform uses Open Banking data, to automatically retrieve a customer’s income and transaction history. For self-employed individuals, the AI model interprets PDF bank statements and categorises inflows to estimate income. However, the AI misclassifies irregular income as non-recurring leading to underestimation of creditworthiness and incorrect risk classification, decisions made without appropriate human review or validation.

Could include implementing data encryption, access controls andnonymisation techniques to safeguard sensitive information, with escalation workflows for certain cases (e.g. self-employed income), aligning to regulations to ensure compliance, and promote more accurate and fairer lending decisions.

Model explainability and transparency

AI credit risk models, particularly those using complex machine learning (ML) or large language models (LLM), often behave like black boxes. This makes it difficult for lenders, underwriters or regulators to understand why a loan was approved or declined. In AI powered journeys, such as agentic AI systems handling pre-screening and automated classification, the requirement for clear, interpretable decisions becomes even more critical.

Unlike traditional models, AI decisions may not be easily traceable or explainable without dedicated tooling to capture the data, the position of the model at the time of the decision and the output, as all three elements will change and evolve over time. This can undermine customer trust and expose institutions to regulatory scrutiny.

An AI model classifies an applicant as high risk without providing an explanation. The AI driven virtual assistant may inform the customer of the decision, but the rationale behind the adverse decision is unclear. The applicant disputes the outcome, the firm is unable to provide sufficient justification, exposing a gap in explainability and regulatory compliance.

Could include the use of explainability tools, copilot and standard templates to justify decisions. Tooling could be used to capture and maintain end to end evidence. Digital assistants can also be trained for compliant communications and integrated into audit trails to meet regulatory expectations. Where needed, decisions should be escalated to a human reviewer for further explanation or input validation.

Performance and accuracy

AI models must remain robust under varying market conditions, economic downturns, shifting borrower behaviours and data drift. Unlike traditional models, AI-powered systems can be affected by slight changes in input data patterns, which may lead to a decline in performance if not properly monitored.

In AI-led lending journeys, decisions such as eligibility, pricing or referrals are made autonomously. If the model overfits past trends or lacks resilience under stress scenarios, it can lead to faulty segmentation, misclassifying borrowers and amplifying credit losses.

During a downturn, the AI model underestimates default risk in medium risk applicants because of reliance on outdated segmentation logic. As economic conditions change, borrower behaviour shifts, but the model fails to recalibrate in real time resulting in unexpected losses.

Could include running stress test AI models, using scenario analysis to monitor performance with copilot to compare with challenger models. The outcomes from retraining should then be reviewed to assess resilience under market shifts and market volatility.

Bias and fairness auditing

AI models can unintentionally learn and amplify historical inequalities in their training data, resulting in unfair treatment of specific demographics, postcodes or income groups. Unlike traditional methods, AI may detect hidden correlations such as income or location that act as proxies for protected characteristics, complicating fairness detection and correction. For example, models trained on income thresholds may inadvertently deprioritise applicants from certain postcodes where socioeconomic disadvantage is concentrated, even without explicit intent. This is not because of the postcode itself, but the structural inequalities it reflects, such as limited access to quality education, employment and housing.

Given the growing regulatory requirements on explainability and fairness, such as FCA's Consumer Duty, EU AI act and GDPR's non-discrimination provisions, institutions are required to implement proactive fairness auditing and mitigation strategies throughout the AI lifecycle.

The AI model consistently assigns higher interest rates to applicants from specific regions, despite similar financial and credit characteristics. Internal escalation flags this as a potential fairness breach. On further review, postcode data and interaction timing are found to act as hidden proxies for ethnicity and income, introducing algorithmic bias.

Could include scenario analysis across borrower segments, product types and geographies to evaluate performance or establishing challenger models or simulation environments to compare AI outputs under multiple economic paths.

Ongoing monitoring and model drift detection

AI models are not static assets – they degrade over time as borrower behaviours evolve, macroeconomic conditions shift, and data pipelines change. Without active monitoring, even high performing models can drift leading to inaccurate credit decisions and regulatory non-compliance.

Unlike traditional models, Gen AI and ML systems require ongoing checks, human in the loop oversight and embedded drift controls to remain trustworthy post deployment.

The AI model applies outdated eligibility rules despite rising inflation and deteriorating borrower affordability. This oversight results in misclassified approvals and a spike in early defaults, triggering post implementation remediation.

Could include using AI observability tools alongside business feedback for real time drift detection across data, performance and fairness metrics; and aligning monitoring cycles with PRA SS1/23 (Model Risk Management Principles) and Basel III model validation standards.

Building a trustworthy AI validation framework

Chatbots, copilots and other AI-driven models hold immense promise for transforming lending and risk management. But with this innovation comes increased complexity, opacity and new systemic risks.

A robust AI validation framework must go beyond traditional testing and embed AI specific assurance:

Regulatory alignment to support compliance with AI deployment (e.g. Basel III, IFRS9, PRA, FCA, GDPR, EU AI Act and Global AI principles).
Transparent, interpretable decision making supported by explainability tools (e.g. SHAP, LIME, Copilots).
Fairness auditing to mitigate bias across demographic segments, using benchmark datasets and escalation triggers.
Resilient performance monitoring to help foster robustness in dynamic economic and borrower environments.
Continuous oversight using real-time drift detection, retraining pipelines and human in the loop governance.

By validating each stage of the AI-led lending journey from data collection to ongoing model monitoring, organisations can balance innovation with trust. This integration of traditional model risk control with next generation AI tooling is key to scaling AI responsibly within organisations.

How KPMG can help

KPMG Trusted AI and risk and regulatory professionals can support all aspects of AI model governance, including:

Model validation across fairness, explainability, performance and robustness, including the use of challenger models to independently assess model decisions and assumptions.
Governance frameworks, helping to align roles, policies and controls to regulatory expectations.
Tooling support, including bias detection, explainability, Model Risk Management platforms and challenger model development and benchmarking.
Regulatory readiness – gap assessments for PRA, FCA and upcoming AI regulations.
Delivering tailored training sessions for boards, developers and model reviewers.

To explore these services or discuss your use cases, please contact Robert Smith, Andrew Fulton, Rajvinder Bains or Douglas Dick.