Adamson AS, Welch HG (2019) Machine learning and the cancer-diagnosis problem—No gold standard. New England J. Medicine 381(24):2285–2287.Crossref, Google Scholar
Allegheny County Department of Human Services (2020) Allegheny family screening tool. The Allegheny Family Screening Tool, https://www.alleghenycounty.us/Services/Human-Services-DHS/DHS-News-and-Events/Accomplishments-and-Innovations/Allegheny-Family-Screening-Tool.Google Scholar
Amirkhani H, Rahmati M (2014) Agreement/disagreement based crowd labeling. Appl. Intelligence 41(1):212–222.Crossref, Google Scholar
Angelova V, Dobbie W, Yang CS (2023) Algorithmic recommendations and human discretion. NBER Working Paper No. 31747, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Argall BD, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robotics Autonomous Systems 57(5):469–483.Crossref, Google Scholar
Banerjee M, Capozzoli M, McSweeney L, Sinha D (1999) Beyond kappa: A review of interrater agreement measures. Canadian J. Statist. 27(1):3–23.Crossref, Google Scholar
Barocas S, Selbst AD (2016) Big data’s disparate impact. California Law Rev. 104:671–732.Google Scholar
Bickel PJ, Klaassen CA, Bickel PJ, Ritov Y, Klaassen J, Wellner JA, Ritov Y (1993) Efficient and Adaptive Estimation for Semiparametric Models, vol. 4 (Johns Hopkins University Press, Baltimore).Google Scholar
Broderick T, Giordano R, Meager R (2020) An automatic finite-sample robustness metric: When can dropping a little data make a big difference? Preprint, submitted November 30, https://arxiv.org/abs/2011.14999.Google Scholar
Buçinca Z, Malaya MB, Gajos KZ (2021) To trust or to think: Cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
Calinon S (2009) Robot Programming by Demonstration (EPFL Press, Lausanne, Switzerland).Google Scholar
Caruana R, Niculescu-Mizil A (2006) An empirical comparison of supervised learning algorithms. Proc. 23rd Internat. Conf. Machine Learn. (ICML, San Diego), 161–168.Google Scholar
Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1721–1730.Google Scholar
Chalfin A, Danieli O, Hillis A, Jelveh Z, Luca M, Ludwig J, Mullainathan S (2016) Productivity and selection of human capital with machine learning. Amer. Econom. Rev. 106(5):124–127.Crossref, Google Scholar
Chiang C-W, Yin M (2021) You’d better stop! Understanding human reliance on machine learning models under covariate shift. Proc. 13th ACM Web Sci. Conf. (ACM, New York), 120–129.Google Scholar
Chouldechova A, Benavides-Prado D, Fialko O, Vaithianathan R (2018) A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc. Conf. Fairness Accountability Transparency (ACM FAccT, New York), 134–148.Google Scholar
Cohen J (1960) A coefficient of agreement for nominal scales. Ed. Psych. Measurement 20(1):37–46.Crossref, Google Scholar
Cook RD (1986) Assessment of local influence. J. Roy. Statist. Soc. Ser. B Methodological 48(2):133–155.Crossref, Google Scholar
Cortes C, DeSalvo G, Mohri M (2016) Learning with rejection. Proc. Internat. Conf. Algorithmic Learn. Theory (Springer, New York), 67–82.Google Scholar
Coyle D, Weller A (2020) “Explaining” machine learning reveals policy challenges. Science 368(6498):1433–1434.Crossref, Google Scholar
Dai J, Fazelpour S, Lipton Z (2021) Fair machine learning under partial compliance. Proc. AAAI/ACM Conf. AI Ethics Society (ACM, New York), 55–65.Google Scholar
D’Amour A, Srinivasan H, Atwood J, Baljekar P, Sculley D, Halpern Y (2020) Fairness is not static: Deeper understanding of long term fairness via simulation studies. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 525–534.Google Scholar
Das AS, Datar M, Garg A, Rajaram S (2007) Google news personalization: Scalable online collaborative filtering. Proc. 16th Internat. Conf. World Wide Web (ACM, New York), 271–280.Google Scholar
Davani AM, Díaz M, Prabhakaran V (2022) Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Trans. Assoc. Comput. Linguist. 10:92–110.Crossref, Google Scholar
Dawes RM, Faust D, Meehl PE (1989) Clinical versus actuarial judgment. Science 243(4899):1668–1674.Crossref, Google Scholar
De-Arteaga M, Fogliato R, Chouldechova A (2020) A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores. Proc. CHI Conf. Human Factors Comput. Systems (ACM, New York), 1–12.Google Scholar
Dietvorst BJ, Simmons JP, Massey C (2015) Algorithm aversion: People erroneously avoid algorithms after seeing them err. J. Experiment. Psych. General 144(1):114.Crossref, Google Scholar
Dietvorst BJ, Simmons JP, Massey C (2018) Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Sci. 64(3):1155–1170.Link, Google Scholar
Doan A, Ramakrishnan R, Halevy AY (2011) Crowdsourcing systems on the world-wide web. Comm. ACM 54(4):86–96.Crossref, Google Scholar
Elster J (1992) Local Justice: How Institutions Allocate Scarce Goods and Necessary Burdens (Russell Sage Foundation, New York).Google Scholar
Eubanks V (2018) Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St. Martin’s Press, New York).Google Scholar
Fazelpour S, De-Arteaga M (2022) Diversity in sociotechnical machine learning systems. Big Data Soc. 9(1).Crossref, Google Scholar
Fogliato R, Xiang A, Lipton Z, Nagin D, Chouldechova A (2021) On the validity of arrest as a proxy for offense: Race and the likelihood of arrest for violent crimes. Proc. AAAI/ACM Conf. AI Ethics Society (ACM, New York), 100–111.Google Scholar
Friedler SA, Scheidegger C, Venkatasubramanian S (2021) The (Im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Commun. ACM 64(4):136–143.Google Scholar
Geva T, Saar-Tsechansky M (2020) Who is a better decision maker? Data-driven expert ranking under unobserved quality. Production Oper. Management 30(1):127–144.Google Scholar
Green B, Chen Y (2021) Algorithmic risk assessments can alter human decision-making processes in high-stakes government contexts. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
Grove WM, Zald DH, Lebow BS, Snitz BE, Nelson C (2000) Clinical versus mechanical prediction: A meta-analysis. Psych. Assessment 12(1):19.Crossref, Google Scholar
Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, et al. (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22):2402–2410.Crossref, Google Scholar
Holstein K, De-Arteaga M, Tumati L, Cheng Y (2023) Toward supporting perceptual complementarity in human-AI collaboration via reflection on unobservables. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
Hong WS, Haimovich AD, Taylor RA (2018) Predicting hospital admission at emergency department triage using machine learning. PloS One 13(7):e0201016.Crossref, Google Scholar
Horvitz EJ, Breese JS, Henrion M (1988) Decision theory in expert systems and artificial intelligence. Internat. J. Approximate Reasoning 2(3):247–302.Crossref, Google Scholar
Jacobs AZ, Wallach H (2021) Measurement and fairness. Proc. ACM Conf. Fairness Accountability Transparency (ACM, New York), 375–385.Google Scholar
Johnson A, Bulgarelli L, Pollard T, Celi LA, Mark R, Horng S (2021) Mimic-iv-ed (version 1.0). PhysioNet (June 3), https://physionet.org/content/mimic-iv-ed/1.0/.Google Scholar
Kennedy EH (2016) Semiparametric theory and empirical processes in causal inference. Statistical Causal Inferences and Their Applications in Public Health Research (Springer, New York), 141–167.Crossref, Google Scholar
Kleinberg J, Lakkaraju H, Leskovec J, Ludwig J, Mullainathan S (2017) Human decisions and machine predictions. Quart. J. Econom. 133(1):237–293.Google Scholar
Koh PW, Liang P (2017) Understanding black-box predictions via influence functions. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (ICML, San Diego), 1885–1894.Google Scholar
Koh PWW, Ang K-S, Teo H, Liang PS (2019) On the accuracy of influence functions for measuring group effects. Advances in Neural Information Processing Systems (ICML, San Diego), 5254–5264.Google Scholar
Lakkaraju H, Kleinberg J, Leskovec J, Ludwig J, Mullainathan S (2017) The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 275–284.Google Scholar
Lebovitz S, Lifshitz-Assaf H, Levina N (2022) To engage or not to engage with AI for critical judgments: How professionals deal with opacity when using AI for medical diagnosis. Organization Sci. 33(1):126–148.Google Scholar
Lichtenstein S, Fischhoff B, Phillips LD (1977) Calibration of probabilities: The state of the art. Proc. 5th Res. Conf. Subjective Probability Utility Decision Making (Springer), 275–324.Google Scholar
Luca M, Kleinberg J, Mullainathan S (2016) Algorithms need managers, too. Harvard Bus. Rev. 94(1):20.Google Scholar
Madras D, Pitassi T, Zemel R (2018) Predict responsibly: Improving fairness and accuracy by learning to defer. Adv. Neural Inform. Processing Systems 31:6147–6157.Google Scholar
Marx C, Calmon F, Ustun B (2020) Predictive multiplicity in classification. Proc. Internat. Conf. Machine Learn. (PMLR), 6765–6774.Google Scholar
McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, Back T, et al. (2020) International evaluation of an AI system for breast cancer screening. Nature 577(7788):89–94.Crossref, Google Scholar
Meehl PE (1954) Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Proc. Invitational Conf. Testing Problems (University of Minnesota Press), 136–141.Google Scholar
Meyer G, Adomavicius G, Johnson PE, Elidrisi M, Rush WA, Sperl-Hillen JM, O’Connor PJ (2014) A machine learning approach to improving dynamic decision making. Inform. Systems Res. 25(2):239–263.Link, Google Scholar
Mullainathan S, Obermeyer Z (2017) Does machine learning automate moral hazard and error? Amer. Econom. Rev. 107(5):476–480.Crossref, Google Scholar
Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. Proc. 22nd Internat. Conf. Machine Learn. (ICML, San Diego), 625–632.Google Scholar
Northcutt CG, Wu T, Chuang IL (2017) Learning with confident examples: Rank pruning for robust classification with noisy labels. Proc. Uncertainty Artificial Intelligence.Google Scholar
Nosofsky RM, Stanton RD, Zaki SR (2005) Procedural interference in perceptual classification: Implicit learning or cognitive complexity? Memory Cognition 33(7):1256–1271.Crossref, Google Scholar
Obermeyer Z, Powers B, Vogeli C, Mullainathan S (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464):447–453.Crossref, Google Scholar
Passi S, Barocas S (2019) Problem formulation and fairness. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 39–48.Google Scholar
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3):61–74.Google Scholar
Raghavan M, Barocas S, Kleinberg J, Levy K (2020) Mitigating bias in algorithmic hiring: Evaluating claims and practices. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 469–481.Google Scholar
Raita Y, Goto T, Faridi MK, Brown DF, Camargo CA, Hasegawa K (2019) Emergency department triage prediction of clinical outcomes using machine learning models. Critical Care 23(1):1–13.Crossref, Google Scholar
Ratner A, Bach SH, Ehrenberg H, Fries J, Wu S, Ré C (2017) Snorkel: Rapid training data creation with weak supervision. Proc. VLDB Endowment. Internat. Conf. Very Large Data Bases, vol. 11, 269.Google Scholar
Saxena D, Badillo-Urquiola K, Wisniewski PJ, Guha S (2020) A human-centered review of algorithms used within the US child welfare system. Proc. CHI Conf. Human Factors Comput. Systems (ACM, New York), 1–15.Google Scholar
Shanteau J (1992) Competence in experts: The role of task characteristics. Organ. Behav. Human Decision Processes 53(2):252–266.Crossref, Google Scholar
Shanteau J (2015) Why task domains (still) matter for understanding expertise. J. Appl. Res. Memory Cognition 4(3):169–175.Crossref, Google Scholar
Sharda R, Barr SH, MCDonnell JC (1988) Decision support system effectiveness: A review and an empirical test. Management Sci. 34(2):139–159.Link, Google Scholar
Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—But is it good?: Evaluating non-expert annotations for natural language tasks. Proc. Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics), 254–263.Google Scholar
Tan C-H, Teo H-H, Benbasat I (2010) Assessing screening and evaluation decision support systems: A resource-matching approach. Inform. Systems Res. 21(2):305–326.Link, Google Scholar
Umesh UN, Peterson RA, Sauber MH (1989) Interjudge agreement and the maximum value of kappa. Ed. Psych. Measurement 49(4):835–850.Crossref, Google Scholar
U.S. Department of Health & Human Services, Administration for Children and Families, Administration on Children, Youth and Families, Children’s Bureau (2022) Child Maltreatment 2020 (January 19), https://www.acf.hhs.gov/cb/data-research/child-maltreatment.Google Scholar
Wang J, Oh J, Wang H, Wiens J (2018) Learning credible models. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 2417–2426.Google Scholar
Wilder B, Horvitz E, Kamar E (2020) Learning to complement humans. Proc. 29th Internat. Joint Conf. Artificial Intelligence (ICML, San Diego), 1526–1533.Google Scholar
Yoganarasimhan H (2020) Search personalization using machine learning. Management Sci. 66(3):1045–1070.Link, Google Scholar

Volume 71, Issue 12

December 2025

Pages vii-x, 9869-10753, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:May 27, 2022
Accepted:September 07, 2024
Published Online:May 05, 2025

Cite as

Maria De-Arteaga, Vincent Jeanselme, Artur Dubrawski, Alexandra Chouldechova (2025) Leveraging Expert Consistency to Improve Algorithmic Decision Support. Management Science 71(12):10465-10485.

https://doi.org/10.1287/mnsc.2022.01576

Keywords

Acknowledgments

The authors thank Benedikt Boecking for feedback throughout the process and the anonymous review team.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Leveraging Expert Consistency to Improve Algorithmic Decision Support

References

Volume 71, Issue 12

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News