Leveraging Expert Consistency to Improve Algorithmic Decision Support
References
- (2019) Machine learning and the cancer-diagnosis problem—No gold standard. New England J. Medicine 381(24):2285–2287.Crossref, Google Scholar
- Allegheny County Department of Human Services (2020) Allegheny family screening tool. The Allegheny Family Screening Tool, https://www.alleghenycounty.us/Services/Human-Services-DHS/DHS-News-and-Events/Accomplishments-and-Innovations/Allegheny-Family-Screening-Tool.Google Scholar
- (2014) Agreement/disagreement based crowd labeling. Appl. Intelligence 41(1):212–222.Crossref, Google Scholar
- (2023) Algorithmic recommendations and human discretion. NBER Working Paper No. 31747, National Bureau of Economic Research, Cambridge, MA.Google Scholar
- (2009) A survey of robot learning from demonstration. Robotics Autonomous Systems 57(5):469–483.Crossref, Google Scholar
- (1999) Beyond kappa: A review of interrater agreement measures. Canadian J. Statist. 27(1):3–23.Crossref, Google Scholar
- (2016) Big data’s disparate impact. California Law Rev. 104:671–732.Google Scholar
- (1993) Efficient and Adaptive Estimation for Semiparametric Models, vol. 4 (Johns Hopkins University Press, Baltimore).Google Scholar
- (2020) An automatic finite-sample robustness metric: When can dropping a little data make a big difference? Preprint, submitted November 30, https://arxiv.org/abs/2011.14999.Google Scholar
- (2021) To trust or to think: Cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
- (2009) Robot Programming by Demonstration (EPFL Press, Lausanne, Switzerland).Google Scholar
- (2006) An empirical comparison of supervised learning algorithms. Proc. 23rd Internat. Conf. Machine Learn. (ICML, San Diego), 161–168.Google Scholar
- (2015) Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1721–1730.Google Scholar
- (2016) Productivity and selection of human capital with machine learning. Amer. Econom. Rev. 106(5):124–127.Crossref, Google Scholar
- (2021) You’d better stop! Understanding human reliance on machine learning models under covariate shift. Proc. 13th ACM Web Sci. Conf. (ACM, New York), 120–129.Google Scholar
- (2018) A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc. Conf. Fairness Accountability Transparency (ACM FAccT, New York), 134–148.Google Scholar
- (1960) A coefficient of agreement for nominal scales. Ed. Psych. Measurement 20(1):37–46.Crossref, Google Scholar
- (1986) Assessment of local influence. J. Roy. Statist. Soc. Ser. B Methodological 48(2):133–155.Crossref, Google Scholar
- (2016) Learning with rejection. Proc. Internat. Conf. Algorithmic Learn. Theory (Springer, New York), 67–82.Google Scholar
- (2020) “Explaining” machine learning reveals policy challenges. Science 368(6498):1433–1434.Crossref, Google Scholar
- (2021) Fair machine learning under partial compliance. Proc. AAAI/ACM Conf. AI Ethics Society (ACM, New York), 55–65.Google Scholar
- (2020) Fairness is not static: Deeper understanding of long term fairness via simulation studies. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 525–534.Google Scholar
- (2007) Google news personalization: Scalable online collaborative filtering. Proc. 16th Internat. Conf. World Wide Web (ACM, New York), 271–280.Google Scholar
- (2022) Dealing with disagreements: Looking beyond the majority vote in subjective annotations. Trans. Assoc. Comput. Linguist. 10:92–110.Crossref, Google Scholar
- (1989) Clinical versus actuarial judgment. Science 243(4899):1668–1674.Crossref, Google Scholar
- (2020) A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores. Proc. CHI Conf. Human Factors Comput. Systems (ACM, New York), 1–12.Google Scholar
- (2015) Algorithm aversion: People erroneously avoid algorithms after seeing them err. J. Experiment. Psych. General 144(1):114.Crossref, Google Scholar
- (2018) Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management Sci. 64(3):1155–1170.Link, Google Scholar
- (2011) Crowdsourcing systems on the world-wide web. Comm. ACM 54(4):86–96.Crossref, Google Scholar
- (1992) Local Justice: How Institutions Allocate Scarce Goods and Necessary Burdens (Russell Sage Foundation, New York).Google Scholar
- (2018) Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (St. Martin’s Press, New York).Google Scholar
- (2022) Diversity in sociotechnical machine learning systems. Big Data Soc. 9(1).Crossref, Google Scholar
- (2021) On the validity of arrest as a proxy for offense: Race and the likelihood of arrest for violent crimes. Proc. AAAI/ACM Conf. AI Ethics Society (ACM, New York), 100–111.Google Scholar
- Friedler SA, Scheidegger C, Venkatasubramanian S (2021) The (Im)possibility of fairness: Different value systems require different mechanisms for fair decision making. Commun. ACM 64(4):136–143.Google Scholar
- (2020) Who is a better decision maker? Data-driven expert ranking under unobserved quality. Production Oper. Management 30(1):127–144.Google Scholar
- (2021) Algorithmic risk assessments can alter human decision-making processes in high-stakes government contexts. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
- (2000) Clinical versus mechanical prediction: A meta-analysis. Psych. Assessment 12(1):19.Crossref, Google Scholar
- (2016) Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22):2402–2410.Crossref, Google Scholar
- (2023) Toward supporting perceptual complementarity in human-AI collaboration via reflection on unobservables. Proc. ACM Human-Comput. Interactions (ACM, New York).Google Scholar
- (2018) Predicting hospital admission at emergency department triage using machine learning. PloS One 13(7):e0201016.Crossref, Google Scholar
- (1988) Decision theory in expert systems and artificial intelligence. Internat. J. Approximate Reasoning 2(3):247–302.Crossref, Google Scholar
- (2021) Measurement and fairness. Proc. ACM Conf. Fairness Accountability Transparency (ACM, New York), 375–385.Google Scholar
- (2021) Mimic-iv-ed (version 1.0). PhysioNet (June 3), https://physionet.org/content/mimic-iv-ed/1.0/.Google Scholar
- (2016) Semiparametric theory and empirical processes in causal inference. Statistical Causal Inferences and Their Applications in Public Health Research (Springer, New York), 141–167.Crossref, Google Scholar
- (2017) Human decisions and machine predictions. Quart. J. Econom. 133(1):237–293.Google Scholar
- (2017) Understanding black-box predictions via influence functions. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (ICML, San Diego), 1885–1894.Google Scholar
- (2019) On the accuracy of influence functions for measuring group effects. Advances in Neural Information Processing Systems (ICML, San Diego), 5254–5264.Google Scholar
- (2017) The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 275–284.Google Scholar
- Lebovitz S, Lifshitz-Assaf H, Levina N (2022) To engage or not to engage with AI for critical judgments: How professionals deal with opacity when using AI for medical diagnosis. Organization Sci. 33(1):126–148.Google Scholar
- (1977) Calibration of probabilities: The state of the art. Proc. 5th Res. Conf. Subjective Probability Utility Decision Making (Springer), 275–324.Google Scholar
- (2016) Algorithms need managers, too. Harvard Bus. Rev. 94(1):20.Google Scholar
- (2018) Predict responsibly: Improving fairness and accuracy by learning to defer. Adv. Neural Inform. Processing Systems 31:6147–6157.Google Scholar
- (2020) Predictive multiplicity in classification. Proc. Internat. Conf. Machine Learn. (PMLR), 6765–6774.Google Scholar
- (2020) International evaluation of an AI system for breast cancer screening. Nature 577(7788):89–94.Crossref, Google Scholar
- (1954) Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Proc. Invitational Conf. Testing Problems (University of Minnesota Press), 136–141.Google Scholar
- (2014) A machine learning approach to improving dynamic decision making. Inform. Systems Res. 25(2):239–263.Link, Google Scholar
- (2017) Does machine learning automate moral hazard and error? Amer. Econom. Rev. 107(5):476–480.Crossref, Google Scholar
- (2005) Predicting good probabilities with supervised learning. Proc. 22nd Internat. Conf. Machine Learn. (ICML, San Diego), 625–632.Google Scholar
- (2017) Learning with confident examples: Rank pruning for robust classification with noisy labels. Proc. Uncertainty Artificial Intelligence.Google Scholar
- (2005) Procedural interference in perceptual classification: Implicit learning or cognitive complexity? Memory Cognition 33(7):1256–1271.Crossref, Google Scholar
- (2019) Dissecting racial bias in an algorithm used to manage the health of populations. Science 366(6464):447–453.Crossref, Google Scholar
- (2019) Problem formulation and fairness. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 39–48.Google Scholar
- (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3):61–74.Google Scholar
- (2020) Mitigating bias in algorithmic hiring: Evaluating claims and practices. Proc. Conf. Fairness Accountability Transparency (ACM, New York), 469–481.Google Scholar
- (2019) Emergency department triage prediction of clinical outcomes using machine learning models. Critical Care 23(1):1–13.Crossref, Google Scholar
- (2017) Snorkel: Rapid training data creation with weak supervision. Proc. VLDB Endowment. Internat. Conf. Very Large Data Bases, vol. 11, 269.Google Scholar
- (2020) A human-centered review of algorithms used within the US child welfare system. Proc. CHI Conf. Human Factors Comput. Systems (ACM, New York), 1–15.Google Scholar
- (1992) Competence in experts: The role of task characteristics. Organ. Behav. Human Decision Processes 53(2):252–266.Crossref, Google Scholar
- (2015) Why task domains (still) matter for understanding expertise. J. Appl. Res. Memory Cognition 4(3):169–175.Crossref, Google Scholar
- (1988) Decision support system effectiveness: A review and an empirical test. Management Sci. 34(2):139–159.Link, Google Scholar
- (2008) Cheap and fast—But is it good?: Evaluating non-expert annotations for natural language tasks. Proc. Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics), 254–263.Google Scholar
- (2010) Assessing screening and evaluation decision support systems: A resource-matching approach. Inform. Systems Res. 21(2):305–326.Link, Google Scholar
- (1989) Interjudge agreement and the maximum value of kappa. Ed. Psych. Measurement 49(4):835–850.Crossref, Google Scholar
- U.S. Department of Health & Human Services, Administration for Children and Families, Administration on Children, Youth and Families, Children’s Bureau (2022) Child Maltreatment 2020 (January 19), https://www.acf.hhs.gov/cb/data-research/child-maltreatment.Google Scholar
- (2018) Learning credible models. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 2417–2426.Google Scholar
- (2020) Learning to complement humans. Proc. 29th Internat. Joint Conf. Artificial Intelligence (ICML, San Diego), 1526–1533.Google Scholar
- (2020) Search personalization using machine learning. Management Sci. 66(3):1045–1070.Link, Google Scholar

