Cost-Aware Calibration of Classifiers

Published Online:https://doi.org/10.1287/ijds.2024.0038

References

  • Ayres-de Campos D, Bernardes J, Garrido A, Marques-de Sa J, Pereira-Leite L (2000) Sisporto 2.0: A program for automated analysis of cardiotocograms. J. Maternal-Fetal Medicine 9(5):311–318.Google Scholar
  • Ban GY, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.LinkGoogle Scholar
  • Bayati M, Braverman M, Gillam M, Mack KM, Ruiz G, Smith MS, Horvitz E (2014) Data-driven decisions for reducing readmissions for heart failure: General methodology and case study. PLoS One 9(10):e109264.Google Scholar
  • Berardi G, Esuli A, Sebastiani F (2015) Utility-theoretic ranking for semiautomated text classification. ACM Trans. Knowledge Discovery Data 10(1):6.Google Scholar
  • Bertsimas D, Kallus N (2020) From predictive to prescriptive analytics. Management Sci. 66(3):1025–1044.LinkGoogle Scholar
  • Card D, Smith NA (2018) The importance of calibration for estimating proportions from annotations. Walker M, Ji H, Stent A, eds. Proc. 2018 Conf. North American Chapter Assoc. Computational Linguistics Human Language Tech., vol. 1, Long Papers (Association for Computational Linguistics, New Orleans), 1636–1646.Google Scholar
  • Dembczynski K, Cheng W, Hüllermeier E (2010) Bayes optimal multilabel classification via probabilistic classifier chains. Proc. 27th Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 279–286.Google Scholar
  • Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint, submitted October 11, https://arxiv.org/abs/1810.04805?amp=1.Google Scholar
  • Domingos P (1999) Metacost: A general method for making classifiers cost-sensitive. Proc. Fifth ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 155–164.Google Scholar
  • Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 70 (PMLR, New York), 1321–1330.Google Scholar
  • Herńndez-Orallo J (2014) Probabilistic reframing for cost-sensitive regression. ACM Trans. Knowledge Discovery Data 8(4):17.Google Scholar
  • Huber J, Müller S, Fleischmann M, Stuckenschmidt H (2019) A data-driven newsvendor problem: From data to decision. Eur. J. Oper. Res. 278(3):904–915.Google Scholar
  • Kleinberg J, Mullainathan S, Raghavan M (2017) Inherent trade-offs in the fair determination of risk scores. Papadimitriou CH, ed. 8th Innovations Theoret. Comput. Sci. Conf. (ITCS 2017), Leibniz International Proceedings in Informatics (LIPIcs), vol. 67 (Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Dagstuhl, Germany), 43:1–43:23.Google Scholar
  • Kuleshov V, Fenner N, Ermon S (2018) Accurate uncertainties for deep learning using calibrated regression. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 2796–2804.Google Scholar
  • Kumar A, Liang PS, Ma T (2019) Verified uncertainty calibration. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY), 3792–3803.Google Scholar
  • Maas A, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. Proc. 49th Annual Meeting Assoc. Computational Linguistics Human Language Tech., HLT ’11, vol. 1 (Association for Computational Linguistics, Cambridge, MA), 142–150.Google Scholar
  • Moro S, Cortez P, Rita P (2014) A data-driven approach to predict the success of bank telemarketing. Decision Support Systems 62:22–31.Google Scholar
  • Nguyen K, O’Connor B (2015) Posterior calibration and exploratory analysis for natural language processing models. Màrquez L, Callison-Burch C, Su J, eds. Proc. 2015 Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Lisbon, Portugal), 1587–1598.Google Scholar
  • Niculescu-Mizil A, Caruana R (2005) Predicting good probabilities with supervised learning. Proc. 22nd Internat. Conf. Machine Learn. (ACM, New York), 625–632.Google Scholar
  • Pate A, Van Staa T, Emsley R (2020) An assessment of the potential miscalibration of cardiovascular disease risk predictions caused by a secular trend in cardiovascular disease in England. BMC Medical Res. Methodology 20(1):1–12.Google Scholar
  • Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classifiers 10(3):61–74.Google Scholar
  • Provost F (2005) Toward economic machine learning and utility-based data mining. Proc. 1st Internat. Workshop Utility-Based Data Mining (Association for Computing Machinery, New York), 1.Google Scholar
  • Shah ND, Steyerberg EW, Kent DM (2018) Big data and predictive analytics: Recalibrating expectations. J. Amer. Medical Assoc. 320(1):27–28.Google Scholar
  • Tuomo A, Suutala J, Röning J, Koskimäki H (2020) Better classifier calibration for small datasets. ACM Trans. Knowledge Discovery Data 14(3):1–19.Google Scholar
  • Vafeiadis T, Diamantaras KI, Sarigiannidis G, Chatzisavvas KC (2015) A comparison of machine learning techniques for customer churn prediction. Simulation Model. Practice Theory 55:1–9.Google Scholar
  • Van Calster B, McLernon DJ, Van Smeden M, Wynants L, Steyerberg EW (2019) Calibration: The Achilles heel of predictive analytics. BMC Medicine 17(1):1–7.Google Scholar
  • Viaene S, Dedene G (2005) Cost-sensitive learning and decision making revisited. Eur. J. Oper. Res. 166(1):212–220.Google Scholar
  • Voudouri A, Khain P, Carmona I, Bellprat O, Grazzini F, Avgoustoglou E, Bettems J, Kaufmann P (2017) Objective calibration of numerical weather prediction models. Atmospheric Res. 190:128–140.Google Scholar
  • Watson L, Guo C, Cormode G, Sablayrolles A (2021) On the importance of difficulty calibration in membership inference attacks. Preprint, submitted November 15, https://arxiv.org/abs/2111.08440.Google Scholar
  • Wolberg WH, Mangasarian OL (1990) Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 87(23):9193–9196.Google Scholar
  • Yeh IC, Lien Ch (2009) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems Appl. 36(2):2473–2480.Google Scholar
  • Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. Proc. Eighth ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 694–699.Google Scholar
  • Zhao S, Kim M, Sahoo R, Ma T, Ermon S (2021) Calibrating predictions to decisions: A novel approach to multi-class calibration. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Advances in Neural Information Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 22313–22324.Google Scholar
  • Zhu C, Byrd RH, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Software 23(4):550–560.Google Scholar
  • Zumel N, Mount J (2019) Practical Data Science with R (Manning, Shelter Island, NY).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.