Statistical Tests for Replacing Human Decision Makers with Algorithms

Published Online:https://doi.org/10.1287/mnsc.2023.01845

References

  • Adepoju I-OO, Albersen BJA, De Brouwere V, van Roosmalen J, Zweekhorst M (2017) mHealth for clinical decision-making in sub-Saharan Africa: A scoping review. JMIR mHealth uHealth 5(3):e38.CrossrefGoogle Scholar
  • Alur R, Raghavan M, Shah D (2024) Human expertise in algorithmic prediction. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. Adv. Neural Inform. Processing Systems, vol. 37 (Curran Associates, Inc., Red Hook, NY), 138088–138129.Google Scholar
  • Alur R, Laine L, Li DK, Raghavan M, Shah D, Shung D (2023) Auditing for human expertise. Oh A, Naumann T, Globerson A, Saenko K, Hardt M, Levine S, eds. Adv. Neural Inform. Processing Systems, vol. 36 (Curran Associates, Inc., Red Hook, NY), 79439–79468.Google Scholar
  • Ardila D, Kiraly AP, Bharadwaj S, Choi B, Reicher JJ, Peng L, Tse D, et al. (2019) End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nature Medicine 25(6):954–961.CrossrefGoogle Scholar
  • Ashar YK, Andrews-Hanna JR, Dimidjian S, Wager TD (2017) Empathic care and distress: Predictive brain markers and dissociable brain systems. Neuron 94(6):1263–1273.CrossrefGoogle Scholar
  • Athey S, Imbens GW (2019) Machine learning methods that economists should know about. Annual Rev. Econom. 11:685–725.CrossrefGoogle Scholar
  • Athey S, Wager S (2021) Policy learning with observational data. Econometrica 89(1):133–161.CrossrefGoogle Scholar
  • Baeza-Yates R, Ribeiro-Neto B (1999) Modern Information Retrieval, vol. 463 (ACM Press, New York).Google Scholar
  • Bansal G, Wu T, Zhou J, Fok R, Nushi B, Kamar E, Ribeiro MT, Weld D (2021) Does the whole exceed its parts? The effect of AI explanations on complementary team performance. Proc. 2021 CHI Conf. Human Factors Comput. Systems, vol. 81 (Association for Computing Machinery, New York), 1–16.Google Scholar
  • Berg T, Burg V, Gombović A, Puri M (2020) On the rise of FinTechs: Credit scoring using digital footprints. Rev. Financial Stud. 33(7):2845–2897.CrossrefGoogle Scholar
  • Berk R (2017) An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. J. Experiment. Criminology 13(2):193–216.CrossrefGoogle Scholar
  • Berk R, Heidari H, Jabbari S, Kearns M, Roth A (2021) Fairness in criminal justice risk assessments: The state of the art. Sociol. Methods Res. 50(1):3–44.CrossrefGoogle Scholar
  • Breiman L (2001) Random forests. Machine Learn. 45(1):5–32.CrossrefGoogle Scholar
  • Brennan M, Puri S, Ozrazgat-Baslanti T, Feng Z, Ruppert M, Hashemighouchani H, Momcilovic P, Li X, Wang DZ, Bihorac A (2019) Comparing clinical judgment with the MySurgeryRisk algorithm for preoperative risk assessment: A pilot usability study. Surgery 165(5):1035–1045.CrossrefGoogle Scholar
  • Bulathwela S, Pérez-Ortiz M, Holloway C, Shawe-Taylor J (2021) Could AI democratise education? Socio-technical imaginaries of an edtech revolution. Preprint, submitted December 3, https://arxiv.org/abs/2112.02034.Google Scholar
  • Chalfin A, Danieli O, Hillis A, Jelveh Z, Luca M, Ludwig J, Mullainathan S (2016) Productivity and selection of human capital with machine learning. Amer. Econom. Rev. 106(5):124–127.CrossrefGoogle Scholar
  • Chan DC, Gentzkow M, Yu C (2022) Selection with variation in diagnostic skill: Evidence from radiologists. Quart. J. Econom. 137(2):729–783.CrossrefGoogle Scholar
  • Chan S, Reddy V, Myers B, Thibodeaux Q, Brownstone N, Liao W (2020) Machine learning in dermatology: Current applications, opportunities, and limitations. Dermatology Therapy 10(3):365–386.CrossrefGoogle Scholar
  • Currie J, MacLeod WB (2017) Diagnosing expertise: Human capital, decision making, and performance among physicians. J. Labor Econom. 35(1):1–43.CrossrefGoogle Scholar
  • Daneshjou R, Vodrahalli K, Novoa RA, Jenkins M, Liang W, Rotemberg V, Ko J, et al. (2022) Disparities in dermatology AI performance on a diverse, curated clinical image set. Sci. Adv. 8(31):eabq6147.CrossrefGoogle Scholar
  • De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, Askham H, et al. (2018) Clinically applicable deep learning for diagnosis and referral in retinal disease. Nature Medicine 24(9):1342–1350.CrossrefGoogle Scholar
  • Donahue K, Chouldechova A, Kenthapadi K (2022) Human-algorithm collaboration: Achieving complementarity and avoiding unfairness. Proc. 2022 ACM Conf. Fairness Accountability Transparency (Association for Computing Machinery, New York), 1639–1656.Google Scholar
  • Dwyer DB, Falkai P, Koutsouleris N (2018) Machine learning approaches for clinical psychology and psychiatry. Annual Rev. Clinical Psych. 14:91–118.CrossrefGoogle Scholar
  • Elliott G, Lieli RP (2013) Predicting binary outcomes. J. Econometrics 174(1):15–26.CrossrefGoogle Scholar
  • Erickson BJ, Korfiatis P, Akkus Z, Kline TL (2017) Machine learning for medical imaging. Radiographics 37(2):505–515.CrossrefGoogle Scholar
  • Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S (2017) Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639):115–118.CrossrefGoogle Scholar
  • Eubanks V (2018) Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St. Martin’s Press, New York). Google Scholar
  • Fekadu R, Getachew A, Tadele Y, Ali N, Goytom I (2022) Machine learning models evaluation and feature importance analysis on NPL dataset. Preprint, submitted August 28, https://arxiv.org/abs/2209.09638.Google Scholar
  • Feng K, Hong H (2024) Statistical inference of optimal allocations I: Regularities and their implications. Preprint, submitted March 27, https://arxiv.org/abs/2403.18248.Google Scholar
  • Feng K, Hong H, Tang K, Wang J (2022) Properties of ROC curves. Preprint, submitted May 30, https://dx.doi.org/10.2139/ssrn.3382962.Google Scholar
  • Fuster A, Goldsmith-Pinkham P, Ramadorai T, Walther A (2022) Predictably unequal? The effects of machine learning on credit markets. J. Finance 77(1):5–47.CrossrefGoogle Scholar
  • Fuster A, Plosser M, Schnabl P, Vickery J (2019) The role of technology in mortgage lending. Rev. Financial Stud. 32(5):1854–1899.CrossrefGoogle Scholar
  • Gillis T, McLaughlin B, Spiess J (2021) On the fairness of machine-assisted human decisions. Preprint, submitted October 28, https://arxiv.org/abs/2110.15310.Google Scholar
  • Guo J, Li B (2018) The application of medical artificial intelligence technology in rural areas of developing countries. Health Equity 2(1):174–181.CrossrefGoogle Scholar
  • Han SS, Park I, Eun Chang S, Lim W, Kim MS, Park GH, Chae JB, Huh CH, Na J-I (2020) Augmented intelligence dermatology: Deep neural networks empower medical professionals in diagnosing skin cancer and predicting treatment options for 134 skin disorders. J. Investigative Dermatology 140(9):1753–1761.CrossrefGoogle Scholar
  • Hannun AY, Rajpurkar P, Haghpanahi M, Tison GH, Bourn C, Turakhia MP, Ng AY (2019) Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network. Nature Medicine 25(1):65–69.CrossrefGoogle Scholar
  • Huang S-C, Pareek A, Seyyedi S, Banerjee I, Lungren MP (2020) Fusion of medical imaging and electronic health records using deep learning: A systematic review and implementation guidelines. NPJ Digital Medicine 3(1):136.CrossrefGoogle Scholar
  • Iakovlev A, Liang A (2024) The value of context: Human versus black box evaluators. Preprint, submitted February 17, https://arxiv.org/abs/2402.11157.Google Scholar
  • Jin W, Fatehi M, Guo R, Hamarneh G (2024) Evaluating the clinical utility of artificial intelligence assistance and its explanation on the glioma grading task. Artificial Intelligence Medicine 148:102751.CrossrefGoogle Scholar
  • Johnson EM, Rehavi MM (2016) Physicians treating physicians: Information and incentives in childbirth. Amer. Econom. J. Econom. Policy 8(1):115–141.CrossrefGoogle Scholar
  • Johnson KW, Torres Soto J, Glicksberg BS, Shameer K, Miotto R, Ali M, Ashley E, Dudley JT (2018) Artificial intelligence in cardiology. J. Amer. College Cardiology 71(23):2668–2679.CrossrefGoogle Scholar
  • Kahneman D, Klein G (2009) Conditions for intuitive expertise: A failure to disagree. Amer. Psych. 64(6):515–526.CrossrefGoogle Scholar
  • Kawaguchi K (2021) When will workers follow an algorithm? A field experiment with a retail business. Management Sci. 67(3):1670–1695.LinkGoogle Scholar
  • Kermany DS, Goldbaum M, Cai W, Valentim CC, Liang H, Baxter SL, McKeown A, et al. (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5):1122–1131.CrossrefGoogle Scholar
  • Kitagawa T, Tetenov A (2018) Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86(2):591–616.CrossrefGoogle Scholar
  • Kleinberg J, Lakkaraju H, Leskovec J, Ludwig J, Mullainathan S (2018) Human decisions and machine predictions. Quart. J. Econom. 133(1):237–293.CrossrefGoogle Scholar
  • Kotz S, Balakrishnan N, Johnson NL (2019) Continuous Multivariate Distributions, Volume 1: Models and Applications, vol. 334 (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Liang H, Tsui BY, Ni H, Valentim CCS, Baxter SL, Liu G, Cai W, et al. (2019) Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nature Medicine 25(3):433–438.CrossrefGoogle Scholar
  • Long E, Lin H, Liu Z, Wu X, Wang L, Jiang J, An Y, et al. (2017) An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nature Biomedical Engrg. 1(2):0024.CrossrefGoogle Scholar
  • Lopez C, Sautmann A, Schaner S (2018) The contribution of patients and providers to the overuse of prescription drugs. NBER Working Paper No. w25284, National Bureau of Economic Research, Cambridge, MA.CrossrefGoogle Scholar
  • Manski CF (2018) Credible ecological inference for medical decisions with personalized risk assessment. Quant. Econom. 9(2):541–569.CrossrefGoogle Scholar
  • Mbakop E, Tabord-Meehan M (2021) Model selection for treatment choice: Penalized welfare maximization. Econometrica 89(2):825–848.CrossrefGoogle Scholar
  • McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, Back T, et al. (2020) International evaluation of an AI system for breast cancer screening. Nature 577(7788):89–94.CrossrefGoogle Scholar
  • Mondal H, Mondal S, Singla RK (2023) Artificial intelligence in rural health in developing countries. Chatterjee JM, Saxena SK, eds. Artificial Intelligence in Medical Virology (Springer Nature, Singapore), 37–48.CrossrefGoogle Scholar
  • Mullainathan S, Obermeyer Z (2022) Diagnosing physician error: A machine learning approach to low-value health care. Quart. J. Econom. 137(2):679–727.CrossrefGoogle Scholar
  • Mullainathan S, Spiess J (2017) Machine learning: An applied econometric approach. J. Econom. Perspect. 31(2):87–106.CrossrefGoogle Scholar
  • O’Neil C (2017) Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (Crown, New York).Google Scholar
  • Peng S, Liu Y, Lv W, Liu L, Zhou Q, Yang H, Ren J, et al. (2021) Deep learning-based artificial intelligence model to assist thyroid nodule diagnosis and management: A multicentre diagnostic study. Lancet Digital Health 3(4):e250–e259.CrossrefGoogle Scholar
  • Rajpurkar P, Hannun AY, Haghpanahi M, Bourn C, Ng AY (2017) Cardiologist-level arrhythmia detection with convolutional neural networks. Preprint, submitted July 6, https://arxiv.org/abs/1707.01836.Google Scholar
  • Rambachan A, Roth J (2019) Bias in, bias out? Evaluating the folk wisdom. Preprint, submitted September 18, https://arxiv.org/abs/1909.08518.Google Scholar
  • Rambachan A, Kleinberg J, Ludwig J, Mullainathan S (2020) An economic perspective on algorithmic fairness. AEA Papers Proc., vol. 110, 91–95.Google Scholar
  • Ren H, Wang J, Zhao WX, Wu N (2021) RAPT: Pre-training of time-aware transformer for learning robust healthcare representation. Proc. 27th ACM SIGKDD Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 3503–3511.Google Scholar
  • Sadhwani A, Giesecke K, Sirignano J (2021) Deep learning for mortgage risk. J. Financial Econometrics 19(2):313–368.CrossrefGoogle Scholar
  • Sajjadiani S, Sojourner AJ, Kammeyer-Mueller JD, Mykerezi E (2019) Using machine learning to translate applicant work history into predictors of performance and turnover. J. Appl. Psych. 104(10):1207–1225.CrossrefGoogle Scholar
  • Studdert DM, Mello MM, Sage WM, DesRoches CM, Peugh J, Zapert K, Brennan TA (2005) Defensive medicine among high-risk specialist physicians in a volatile malpractice environment. JAMA 293(21):2609–2617.CrossrefGoogle Scholar
  • Swanson K, Wu E, Zhang A, Alizadeh AA, Zou J (2023) From patterns to patients: Advances in clinical machine learning for cancer diagnosis, prognosis, and treatment. Cell 186(8):1772–1791.CrossrefGoogle Scholar
  • Tian F, Liu D, Wei N, Fu Q, Sun L, Liu W, Sui X, et al. (2024) Prediction of tumor origin in cancers of unknown primary origin with cytology-based deep learning. Nature Medicine 30(5):1309–1319.CrossrefGoogle Scholar
  • Trivedi A, Mukherjee S, Tse E, Ewing A, Ferres JL (2019) Risks of using non-verified open data: A case study on using machine learning techniques for predicting pregnancy outcomes in India. Preprint, submitted October 4, https://arxiv.org/abs/1910.02136.Google Scholar
  • Uhm K-H, Jung S-W, Choi MH, Shin H-K, Yoo J-I, Oh SW, Kim JY, et al. (2021) Deep learning for end-to-end kidney cancer diagnosis on multi-phase abdominal computed tomography. NPJ Precision Oncology 5(1):54.CrossrefGoogle Scholar
  • Vaccaro M, Almaatouq A, Malone T (2024) When combinations of humans and AI are useful: A systematic review and meta-analysis. Nature Human Behav. 8(12):2293–2303.CrossrefGoogle Scholar
  • Vallee B, Zeng Y (2019) Marketplace lending: A new banking paradigm? Rev. Financial Stud. 32(5):1939–1982.CrossrefGoogle Scholar
  • Wang Z, Wei L, Xue L (2024) Overcoming medical overuse with AI assistance: An experimental investigation. Preprint, submitted May 17, https://arxiv.org/abs/2405.10539.Google Scholar
  • Wang J, Gallo E, Zhang W, Tang K, Hong H (2023) Diagnosing with the help of artificial intelligence. Working paper, Beihang University, Beijing.Google Scholar
  • Yang J, Xie M, Hu C, Alwalid O, Xu Y, Liu J, Jin T, et al. (2021) Deep learning for detecting cerebral aneurysms with CT angiography. Radiology 298(1):155–163.CrossrefGoogle Scholar
  • Yoon J, Kang C, Kim S, Han J (2022) D-vlog: Multimodal vlog dataset for depression detection. Proc. AAAI Conf. Artificial Intelligence, vol. 36 (PKP Publishing Services Network, Vancouver), 12226–12234.Google Scholar
  • Yu F, Moehring A, Banerjee O, Salz T, Agarwal N, Rajpurkar P (2024) Heterogeneity and predictors of the effects of AI assistance on radiologists. Nature Medicine 30(3):837–849.CrossrefGoogle Scholar
  • Zeng J, Ustun B, Rudin C (2017) Interpretable classification models for recidivism prediction. J. Roy. Statist. Soc. Ser. A Statist. Soc. 180(3):689–722.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.