October 22, 2020 in Artificial Intelligence

Ensuring Responsible Use of Artificial Intelligence

‘AI explainability’ and ‘fairness assessments’ help surface potential modeling problems for data scientists.

Anupam Datta

SHARE: PRINT ARTICLE:

https://doi.org/10.1287/LYTX.2020.06.02

The use of machine learning (ML) is exploding, but ML has a big flaw: it’s a black box. That means even when models work, data scientists don’t necessarily know why. This becomes a problem when models break, or when regulators or consumers ask questions about a result.

It’s also an ethical issue because ML models are now being used to make decisions that significantly impact people’s lives. Examples include credit decisions, insurance underwriting, healthcare diagnostics, recidivism predictions and facial recognition for law enforcement. Responsible and ethical adoption of machine learning and artificial intelligence (AI) is crucial to ensuring that the people who are subject to these models are treated fairly, and that they have recourse when it’s needed.

How can organizations responsibly adopt AI? Two key factors are AI explainability and fairness assessments. They are important when a model is developed by data scientists and on an ongoing basis once it is in production. It is critical to explain individual decisions made by a production model and monitor it for fairness.

The science behind what drives outputs of ML models is AI explainability. It is the underpinning for responsible AI, helps model developers understand the drivers inside their models, and can also help isolate individual variables and their impact on model performance. Without AI explainability, models remain black boxes.

AI explainability provides essential data not only to model developers, but also to people affected by the models – for instance, a person who is denied credit due to model output. AI explainability helps determine whether model output was fair and appropriate or the result of an error in the data that a person might then be able to correct in order to get a better outcome.

Transparency and Recourse

The Fair Credit Reporting Act (FCRA) promotes the accuracy, fairness and privacy of information in the files of consumer reporting agencies. Provisions include the right to dispute inaccurate or incomplete information. This requires companies to be transparent about how they arrive at credit conclusions.

When organizations such as banks used simple linear models to make those conclusions (as they did for many years, and some still do), transparency was relatively easy. A person would inquire, and a credit analyst would look at the simple inputs of a linear model and tell the person why they were (for example) denied credit. If a person was unhappy with the decision or felt it wasn’t fair, the person had recourse – he or she could appeal the decision or work to improve their own credit based on the information and then reapply.

But now ML models are making the decisions. They are far more complex. An analyst can’t look at an ML model and translate it for a consumer. Companies now need AI explainability tools to do that.

Financial institutions also need to comply with the Equal Credit Opportunity Act (ECOA), which prohibits credit discrimination on the basis of race, color, religion, national origin, sex, marital status, age or because a person receives public assistance. When financial institutions use ML models to make decisions, they need to be able to prove that these factors did not come into play. AI explainability tools are required to do that as well.

The ECOA also expects financial institutions to create adverse impact notices that answer questions such as, “Why was Jane denied credit when compared to a set of people who were approved?” Consumers often also expect answers to questions, such as, “Why was Jane denied credit when John was approved?” Better AI explainability may have helped Apple in late 2019 when it was widely reported that a woman received a significantly lower credit line than her husband with no clear explanation [1].

Fairness Assessments

Years ago, redlining – explicitly excluding certain neighborhoods from loan consideration – was a common practice. Though it’s mostly unintentional, it still happens today, and fairness assessments can help prevent it.

Fairness assessments are designed to help ascertain whether a model is consistently producing worse outcomes for a protected group (e.g., based on race or gender); and if so, determine why it’s happening and whether it’s justified. The objective is to avoid or prevent unfair bias in models, proactively.

To perform these examinations, organizations need to go through an exercise to both measure and understand whether the model’s outcomes are significantly worse for a protected group and, if so, what’s causing the disparity. Once this is determined, the next step is naturally to act to mitigate any unfair bias.

As an illustrative example, consider an ML model used by a bank to determine risk for personal loans. The bank wanted to ensure that gender was not a factor in determining model output. Here’s how a data science team would execute on the measure/understand/act principle:

Measure – The data science team determined that the model was providing far worse outcomes for women than men. Standard metrics can be used to automatically quantify how different the approval rates are between the two groups. For example, one metric, called the disparate impact ratio, measures the ratio of the approval rates for women and men. If the metric is below a threshold (e.g., 80%), indicating that the drift is significant, the data science team is alerted to dig deeper.

Understand – The next step is for the data science team to understand why the model views women to be higher risk than men. This step can leverage recent technical advances in understanding the inner workings of ML models to answer this “why” question. The data science team can thus quickly pinpoint the features contributing to the disparate impact ratio. The data science team discovers that the marital status [2] feature was the main driver of the disparity.

When they drill down even further, they find that married individuals are viewed as lower risk by the model, and the proportion of married men is significantly higher than the proportion of married women in the dataset used to train the ML model.

Act – Armed with this actionable information, the bank can act to make decisions with greater confidence. For example, the data science team can leverage this understanding to conclude that the training data was not representative of the population to which the model would be applied. They can then mitigate the unfair bias in the model by balancing the proportion of married men and women in the training data and retraining the model. This is but one example from a whole set of unfair bias mitigation strategies that are available to data scientists.

Building a Culture for Responsible AI

Data scientists can’t ensure responsible use of AI without AI explainability and fairness assessments, which help surface potential model problems and also indicate how those problems might be mitigated.

These model evaluations should be happening thousands of times a day around the world – not only because it’s the right and responsible thing to do, but also because, in many sectors, it’s a regulatory requirement. With ML models – and not linear models or people – now making many decisions, regulated companies must have the tools in place to understand and explain those decisions. Banks need to be able to accurately explain to an applicant why a particular decision was made with respect to their credit. Insurance companies need to be able to explain why a policy was not renewed. Hiring managers need to be able to explain why a job candidate did not make the cut.

Much of this comes down to the culture of your data science team. If you establish a culture focused on accuracy only, your data scientists will focus only on accuracy as their lead metric. If instead you make it clear that responsible AI and fairness are the gold standard, data scientists will work toward those goals. Leaders must set the right KPIs.

The general public’s mistrust in AI will perpetuate until more companies make responsible use of AI a priority.

Reference & Note

https://www.cnbc.com/2019/11/14/apple-card-algo-affair-and-the-future-of-ai-in-your-everyday-life.html
While in the U.S. marital status cannot be directly used for credit decisions, other jurisdictions permit its use.

This article appears in INFORMS Analytics Collections Vol. 16: Advances in Integrating AI & O.R.

Visit this collection for free access to more articles showcasing the depth and breadth of research and applications at the intersection of AI and operations research.

Anupam Datta

Anupam Datta is co-founder, president and chief scientist at TruEra.

Keywords: