Leveraging Expert Consistency to Improve Algorithmic Decision Support

Published Online:https://doi.org/10.1287/mnsc.2022.01576

References

  • Adamson AS, Welch HG (2019) Machine learning and the cancer-diagnosis problem—No gold standard. New England J. Medicine 381(24):2285–2287.CrossrefGoogle Scholar
  • Allegheny County Department of Human Services (2020) Allegheny family screening tool. The Allegheny Family Screening Tool, https://www.alleghenycounty.us/Services/Human-Services-DHS/DHS-News-and-Events/Accomplishments-and-Innovations/Allegheny-Family-Screening-Tool.Google Scholar
  • Amirkhani H, Rahmati M (2014) Agreement/disagreement based crowd labeling. Appl. Intelligence 41(1):212–222.CrossrefGoogle Scholar
  • Angelova V, Dobbie W, Yang CS (2023) Algorithmic recommendations and human discretion. NBER Working Paper No. 31747, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robotics Autonomous Systems 57(5):469–483.CrossrefGoogle Scholar
  • Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond kappa: A review of interrater agreement measures. Canadian J. Statist. 27(1):3–23.CrossrefGoogle Scholar
  • Barocas S, Selbst AD (2016) Big data’s disparate impact. California Law Rev. 104:671–732.Google Scholar
  • Bickel PJ, Klaassen CA, Bickel PJ, Ritov Y, Klaassen J, Wellner JA, Ritov Y (1993) Efficient and Adaptive Estimation for Semiparametric Models, vol. 4 (Johns Hopkins University Press, Baltimore).Google Scholar
  • Broderick T, Giordano R, Meager R (2020) An automatic finite-sample robustness metric: When can dropping a little data make a big difference? Preprint, submitted November 30, https://arxiv.org/abs/2011.14999.Google Scholar
  • Buçinca Z, Malaya MB, Gajos KZ (2021) To trust or to think: Cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
  • Calinon S (2009) Robot Programming by Demonstration (EPFL Press, Lausanne, Switzerland).Google Scholar
  • Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. Proc. 23rd Internat. Conf. Machine Learn. (ICML, San Diego), 161–168.Google Scholar
  • Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1721–1730.Google Scholar
  • Chalfin A, Danieli O, Hillis A, Jelveh Z, Luca M, Ludwig J, Mullainathan S (2016) Productivity and selection of human capital with machine learning. Amer. Econom. Rev. 106(5):124–127.CrossrefGoogle Scholar
  • Chiang C-W, Yin M (2021) You’d better stop! Understanding human reliance on machine learning models under covariate shift. Proc. 13th ACM Web Sci. Conf. (ACM, New York), 120–129.Google Scholar
  • Chouldechova A, Benavides-Prado D, Fialko O, Vaithianathan R (2018) A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc. Conf. Fairness Accountability Transparency (ACM FAccT, New York), 134–148.Google Scholar
  • Cohen J (1960) A coefficient of agreement for nominal scales. Ed. Psych. Measurement 20(1):37–46.CrossrefGoogle Scholar
  • Cook RD (1986) Assessment of local influence. J. Roy. Statist. Soc. Ser. B Methodological 48(2):133–155.CrossrefGoogle Scholar
  • Cortes C, DeSalvo G, Mohri M (2016) Learning with rejection. Proc. Internat. Conf. Algorithmic Learn. Theory (Springer, New York), 67–82.Google Scholar
  • Coyle D, Weller A (2020) “Explaining” machine learning reveals policy challenges. Science 368(6498):1433–1434.CrossrefGoogle Scholar
  • Dai J, Fazelpour S, Lipton Z (2021) Fair machine learning under partial compliance. Proc. AAAI/ACM Conf. AI Ethics Society (ACM, New York), 55–65.Google Scholar
  • D’Amour A, Srinivasan H, Atwood J, Baljekar P, Sculley D, Halpern Y (2020) Fairness is not static: Deeper understanding of long term fairness via simulation studies. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 525–534.Google Scholar
  • Das AS, Datar M, Garg A, Rajaram S (2007) Google news personalization: Scalable online collaborative filtering. Proc. 16th Internat. Conf. World Wide Web (ACM, New York), 271–280.Google Scholar
  • Davani AM, Díaz M, Prabhakaran V (2022) Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Trans. Assoc. Comput. Linguist. 10:92–110.CrossrefGoogle Scholar
  • Dawes RM, Faust D, Meehl PE (1989) Clinical versus actuarial judgment. Science 243(4899):1668–1674.CrossrefGoogle Scholar
  • De-Arteaga M, Fogliato R, Chouldechova A (2020) A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores. Proc. CHI Conf. Human Factors Comput. Systems (ACM, New York), 1–12.Google Scholar
  • Dietvorst BJ, Simmons JP, Massey C (2015) Algorithm aversion: People erroneously avoid algorithms after seeing them err. J. Experiment. Psych. General 144(1):114.CrossrefGoogle Scholar
  • Dietvorst BJ, Simmons JP, Massey C (2018) Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Sci. 64(3):1155–1170.LinkGoogle Scholar
  • Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the world-wide web. Comm. ACM 54(4):86–96.CrossrefGoogle Scholar
  • Elster J (1992) Local Justice: How Institutions Allocate Scarce Goods and Necessary Burdens (Russell Sage Foundation, New York).Google Scholar
  • Eubanks V (2018) Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St. Martin’s Press, New York).Google Scholar
  • Fazelpour S, De-Arteaga M (2022) Diversity in sociotechnical machine learning systems. Big Data Soc. 9(1).CrossrefGoogle Scholar
  • Fogliato R, Xiang A, Lipton Z, Nagin D, Chouldechova A (2021) On the validity of arrest as a proxy for offense: Race and the likelihood of arrest for violent crimes. Proc. AAAI/ACM Conf. AI Ethics Society (ACM, New York), 100–111.Google Scholar
  • Friedler SA, Scheidegger C, Venkatasubramanian S (2021) The (Im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Commun. ACM 64(4):136–143.Google Scholar
  • Geva T, Saar-Tsechansky M (2020) Who is a better decision maker? Data-driven expert ranking under unobserved quality. Production Oper. Management 30(1):127–144.Google Scholar
  • Green B, Chen Y (2021) Algorithmic risk assessments can alter human decision-making processes in high-stakes government contexts. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
  • Grove WM, Zald DH, Lebow BS, Snitz BE, Nelson C (2000) Clinical versus mechanical prediction: A meta-analysis. Psych. Assessment 12(1):19.CrossrefGoogle Scholar
  • Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, et al. (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22):2402–2410.CrossrefGoogle Scholar
  • Holstein K, De-Arteaga M, Tumati L, Cheng Y (2023) Toward supporting perceptual complementarity in human-AI collaboration via reflection on unobservables. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
  • Hong WS, Haimovich AD, Taylor RA (2018) Predicting hospital admission at emergency department triage using machine learning. PloS One 13(7):e0201016.CrossrefGoogle Scholar
  • Horvitz EJ, Breese JS, Henrion M (1988) Decision theory in expert systems and artificial intelligence. Internat. J. Approximate Reasoning 2(3):247–302.CrossrefGoogle Scholar
  • Jacobs AZ, Wallach H (2021) Measurement and fairness. Proc. ACM Conf. Fairness Accountability Transparency (ACM, New York), 375–385.Google Scholar
  • Johnson A, Bulgarelli L, Pollard T, Celi LA, Mark R, Horng S (2021) Mimic-iv-ed (version 1.0). PhysioNet (June 3), https://physionet.org/content/mimic-iv-ed/1.0/.Google Scholar
  • Kennedy EH (2016) Semiparametric theory and empirical processes in causal inference. Statistical Causal Inferences and Their Applications in Public Health Research (Springer, New York), 141–167.CrossrefGoogle Scholar
  • Kleinberg J, Lakkaraju H, Leskovec J, Ludwig J, Mullainathan S (2017) Human decisions and machine predictions. Quart. J. Econom. 133(1):237–293.Google Scholar
  • Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (ICML, San Diego), 1885–1894.Google Scholar
  • Koh PWW, Ang K-S, Teo H, Liang PS (2019) On the accuracy of influence functions for measuring group effects. Advances in Neural Information Processing Systems (ICML, San Diego), 5254–5264.Google Scholar
  • Lakkaraju H, Kleinberg J, Leskovec J, Ludwig J, Mullainathan S (2017) The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 275–284.Google Scholar
  • Lebovitz S, Lifshitz-Assaf H, Levina N (2022) To engage or not to engage with AI for critical judgments: How professionals deal with opacity when using AI for medical diagnosis. Organization Sci. 33(1):126–148.Google Scholar
  • Lichtenstein S, Fischhoff B, Phillips LD (1977) Calibration of probabilities: The state of the art. Proc. 5th Res. Conf. Subjective Probability Utility Decision Making (Springer), 275–324.Google Scholar
  • Luca M, Kleinberg J, Mullainathan S (2016) Algorithms need managers, too. Harvard Bus. Rev. 94(1):20.Google Scholar
  • Madras D, Pitassi T, Zemel R (2018) Predict responsibly: Improving fairness and accuracy by learning to defer. Adv. Neural Inform. Processing Systems 31:6147–6157.Google Scholar
  • Marx C, Calmon F, Ustun B (2020) Predictive multiplicity in classification. Proc. Internat. Conf. Machine Learn. (PMLR), 6765–6774.Google Scholar
  • McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, Back T, et al. (2020) International evaluation of an AI system for breast cancer screening. Nature 577(7788):89–94.CrossrefGoogle Scholar
  • Meehl PE (1954) Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Proc. Invitational Conf. Testing Problems (University of Minnesota Press), 136–141.Google Scholar
  • Meyer G, Adomavicius G, Johnson PE, Elidrisi M, Rush WA, Sperl-Hillen JM, O’Connor PJ (2014) A machine learning approach to improving dynamic decision making. Inform. Systems Res. 25(2):239–263.LinkGoogle Scholar
  • Mullainathan S, Obermeyer Z (2017) Does machine learning automate moral hazard and error? Amer. Econom. Rev. 107(5):476–480.CrossrefGoogle Scholar
  • Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. Proc. 22nd Internat. Conf. Machine Learn. (ICML, San Diego), 625–632.Google Scholar
  • Northcutt CG, Wu T, Chuang IL (2017) Learning with confident examples: Rank pruning for robust classification with noisy labels. Proc. Uncertainty Artificial Intelligence.Google Scholar
  • Nosofsky RM, Stanton RD, Zaki SR (2005) Procedural interference in perceptual classification: Implicit learning or cognitive complexity? Memory Cognition 33(7):1256–1271.CrossrefGoogle Scholar
  • Obermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464):447–453.CrossrefGoogle Scholar
  • Passi S, Barocas S (2019) Problem formulation and fairness. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 39–48.Google Scholar
  • Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3):61–74.Google Scholar
  • Raghavan M, Barocas S, Kleinberg J, Levy K (2020) Mitigating bias in algorithmic hiring: Evaluating claims and practices. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 469–481.Google Scholar
  • Raita Y, Goto T, Faridi MK, Brown DF, Camargo CA, Hasegawa K (2019) Emergency department triage prediction of clinical outcomes using machine learning models. Critical Care 23(1):1–13.CrossrefGoogle Scholar
  • Ratner A, Bach SH, Ehrenberg H, Fries J, Wu S, Ré C (2017) Snorkel: Rapid training data creation with weak supervision. Proc. VLDB Endowment. Internat. Conf. Very Large Data Bases, vol. 11, 269.Google Scholar
  • Saxena D, Badillo-Urquiola K, Wisniewski PJ, Guha S (2020) A human-centered review of algorithms used within the US child welfare system. Proc. CHI Conf. Human Factors Comput. Systems (ACM, New York), 1–15.Google Scholar
  • Shanteau J (1992) Competence in experts: The role of task characteristics. Organ. Behav. Human Decision Processes 53(2):252–266.CrossrefGoogle Scholar
  • Shanteau J (2015) Why task domains (still) matter for understanding expertise. J. Appl. Res. Memory Cognition 4(3):169–175.CrossrefGoogle Scholar
  • Sharda R, Barr SH, MCDonnell JC (1988) Decision support system effectiveness: A review and an empirical test. Management Sci. 34(2):139–159.LinkGoogle Scholar
  • Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—But is it good?: Evaluating non-expert annotations for natural language tasks. Proc. Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics), 254–263.Google Scholar
  • Tan C-H, Teo H-H, Benbasat I (2010) Assessing screening and evaluation decision support systems: A resource-matching approach. Inform. Systems Res. 21(2):305–326.LinkGoogle Scholar
  • Umesh UN, Peterson RA, Sauber MH (1989) Interjudge agreement and the maximum value of kappa. Ed. Psych. Measurement 49(4):835–850.CrossrefGoogle Scholar
  • U.S. Department of Health & Human Services, Administration for Children and Families, Administration on Children, Youth and Families, Children’s Bureau (2022) Child Maltreatment 2020 (January 19), https://www.acf.hhs.gov/cb/data-research/child-maltreatment.Google Scholar
  • Wang J, Oh J, Wang H, Wiens J (2018) Learning credible models. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 2417–2426.Google Scholar
  • Wilder B, Horvitz E, Kamar E (2020) Learning to complement humans. Proc. 29th Internat. Joint Conf. Artificial Intelligence (ICML, San Diego), 1526–1533.Google Scholar
  • Yoganarasimhan H (2020) Search personalization using machine learning. Management Sci. 66(3):1045–1070.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.