June 3, 2021 in Last Word
Algorithm fairness
SHARE: PRINT ARTICLE:
https://doi.org/10.1287/orms.2021.03.20
The automated collection and analysis of large data sets applying artificial intelligence and machine learning (AI/ML) offers businesses and organizations the promise of improving their operations and decision-making. As a result, decision support has been implemented across application areas, including highly-equity-sensitive domains such as human resources and healthcare. In other words, data-driven decision support fueled by AI/ML is now affecting every aspect of our lives: who gets a job or receives a promotion, who has a timely diagnosis of skin cancer, who is granted parole and more.
While decision-support tools using automated machine learning algorithms have tremendous potential to aid organizational decision-making, they also have the potential to damage individual lives and perpetuate social inequity. Researchers and practitioners across various application areas have voiced concerns about the harm that may be caused by such tools and the lack of accountability associated with them – and with good reason.
Biased Algorithms
Amazon made headlines when it had to discard a resume screening tool built by its engineers to select the best job candidates from an applicant pool, following the discovery that the tool had penalized resumes of female applicants for technical job roles by basing decisions on gender-specific phrases like “women’s chess team” [1]. Because these technical jobs were dominated by men, a model trained by looking at previously hired and successful candidates learned to read factors correlated with maleness as indicators of potential success.
Another example of unequitable application of machine learning comes from the 2016 ProPublica release of COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) data. The COMPAS model is a proprietary algorithm used to evaluate the risk of recidivism. The corresponding risk scores can be used in court to help determine parole decisions. What the data showed was that Black defendants were more likely to be falsely labeled as risky than white defendants. Conversely, white defendants were more likely to be falsely labeled as low risk for recidivism than Black defendants [2]. More recently, a study on setting bail showed that participants were more likely to adjust the recidivism risk scores of Black defendants upward and white defendants’ scores downward [3], indicating that including a human in the decision-making loop can in some cases increase, rather than mitigate, bias.
A final example focuses on an ad for science, technology, engineering and math (STEM) field opportunities that had been carefully designed to be gender neutral. However, once the gender-neutral ad was posted on social media, it was disproportionately shown to more men than women [4]. The ad allocation algorithm had been optimized to maximize cost-effectiveness by showing the ad to as many people possible with the given budget. Because this social network charged a higher cost to show ads to young women (who typically buy more through ads) than it did to show ads to young men, the algorithm decided to target the ad to men.
As these scenarios highlight, the problem of machine learning bias and lack of algorithmic fairness has many roots, including biased data, unequal representation, unconscious biases, ill-defined objective functions and lack of real-time monitoring.
Fixing the Analytical Process
On the surface, the analytical process is straightforward and objective: collect (at least arguably) objective data, build models based on these data, then incorporate those models into the decision-making. However, as most now understand, this process is actually permeated with human judgment, and bias can enter at any point. It is therefore critical that we apply bias-aware decision-making processes that account for the environments in which they are employed. Who are the stakeholders, who are the people directly affected, and who is affected but not represented in the data? Acknowledging the importance of the analytical environment in which the algorithm runs and operates, we focus on the data and modeling choices in the context of a bias-aware analytical process (see Figure 1), which starts with the data. We observe that data biases can stem from multiple sources, i.e., the data can be inadequately representative, data elements can be fundamentally biased (e.g., performance evaluations), data can reflect prior biased decisions, or the data may use proxies that benefit one group over another.
Algorithms then interact with the data, producing a model that will eventually become the foundation of a decision-support tool. In this context it is important to understand that models will typically perform the best for the majority group, because most off-the-shelf algorithms optimize overall accuracy, and a good performance for the majority group weights the most in this measure.
We note, unfortunately, that simply removing demographic variables from the data will not lead to gender- or race-blind models, because even if we remove the demographics, the data will still contain other factors that correlate with both the missing demographic factors and the outcome. For example, in lending, a model that does not include variables for race may still include information, such as ZIP code, which carries information about the applicant’s race. An algorithm will pick up on these correlations.
Also related to representation in the data, we observe that if the data is not balanced (meaning that different groups are equally represented across outcomes), even carefully built models will not lead to equal outcomes across groups. For example, in a company where most managers have been male, a well-constructed model that identifies candidates for management positions will be more likely to falsely identify male employees as better candidates than female employees. This algorithm is therefore likely to miss deserving female employees who would make good managers.
Equity Awareness
Figure 1: The analytical process. The “standard” data modeling process is represented by the blue boxes, and the additional steps for bias awareness are represented by the orange ovals (and the addition of population aware modeling) [5].
For all of these reasons and more, we need to be especially aware of how our models perform across demographic groups. One tool that can be helpful in building this awareness is a bias dashboard, which breaks down a model’s performance for different groups. There are a number of both commercial and open-source tools that can do this for you. However, even when the performance across groups of testing data is acceptable, the model may still contain bias. This is why careful monitoring of a model’s performance once it is deployed is an important part of an equity-aware decision-making system.
There is no silver bullet that will make each algorithm fair and every decision-support tool unbiased. However, over the past few years, there has been a proliferation of studies focused on understanding why algorithms behave the way they do, and suggested frameworks for unbiased application of AI/ML. We are therefore at the point where it is simply no longer acceptable to develop biased algorithms or decision-support tools. We need to incorporate the considerations of equity into our modeling, apply bias-aware data mining processes, and audit the performance of our models. It is only through taking these steps to improve equity that we can make it possible for everyone to benefit from AI/ML.
References
- Jeffrey Dastin, 2018, “Amazon scraps secret AI recruiting tool that showed bias against women,” Reuters, Oct. 10, https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G.
- Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner, 2016, “Machine bias,” ProPublica, May 23, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.
- Ben Green and Yiling Chen, 2019, “Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments,” Proceedings of the Conference on Fairness, Accountability, and Transparency, Jan. 29-31, Atlanta, pp. 90-99.
- Anja Lambrecht and Catherine Tucker, 2019, “Algorithmic bias? An empirical study of apparent gender-based discrimination in the display of stem career ads.,” Management Science, Vol. 65, No. 7, pp. 2966-2981.
- David Anderson, Margrét V. Bjarnadóttir and David Ross, 2021, “There are no colorblind models in a colorful world: How to successfully apply a people analytics tool to build equitable workplaces,” Wharton People Analytics, White Paper Competition Winner, https://wpa.wharton.upenn.edu/wp-content/uploads/2021/04/Winner_-There-Are-No-Colorblind-Models-in-a-Colorful-World_-How-to-Successfully-Apply-a-People-Analytics-Tool-to-Build-Equitable-Workplaces.pdf.
Margrét V. Bjarnadóttir is an associate professor of management science and statistics in the Robert H. Smith School of Business at the University of Maryland, College Park. She graduated from MIT’s Operations Research Center in 2008. Her research focuses on analytics in healthcare and human resource management (and sports!). Her paper on algorithmic fairness recently won the Wharton People Analytics white paper competition.
