Improving Reliability Estimation for Individual Numeric Predictions: A Machine Learning Approach

Published Online:https://doi.org/10.1287/ijoc.2020.1019

References

  • Bosnić Z, Kononenko I (2008a) Comparison of approaches for estimating reliability of individual regression predictions. Data Knowledge Engrg. 67(3):504–516.CrossrefGoogle Scholar
  • Bosnić Z, Kononenko I (2008b) Estimation of individual prediction reliability using the local sensitivity analysis. Appl. Intelligence 29(3):187–203.CrossrefGoogle Scholar
  • Bosnić Z, Kononenko I (2009) An overview of advances in reliability estimation of individual predictions in machine learning. Intelligence Data Anal. 13(2):385–401.CrossrefGoogle Scholar
  • Breiman L (1996) Bagging predictors. Machine Learn. 24(2):123–140.CrossrefGoogle Scholar
  • Brier GW (1950) Verification of forecasts expressed in terms of probability. Monthly Weather Rev. 78(1):1–3.CrossrefGoogle Scholar
  • Briesemeister S, Rahnenführer J, Kohlbacher O (2012) No longer confidential: Estimating the confidence of individual regression predictions. PLoS One 7(11):e48723.CrossrefGoogle Scholar
  • Carney JG, Cunningham P, Bhagwan U (1999) Confidence and prediction intervals for neural network ensembles. IJCNN ’99: Proc. Internat. Joint Conf. Neural Networks (IEEE, Washington, DC), 1215–1218.Google Scholar
  • Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. Krishnapuram B, Shah M, Smola AJ, Aggarwal CC, Shen D, Rastogi R, eds. KDD ’16: Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 785–794.Google Scholar
  • Choi E, Schuetz A, Stewart WF, Sun J (2016) Using recurrent neural network models for early detection of heart failure onset. J. Amer. Medical Inform. Assoc. 24(2):361–370.CrossrefGoogle Scholar
  • Clark RD (2009) DPRESS: Localizing estimates of predictive uncertainty. J. Cheminformatics 1(1):1–16.CrossrefGoogle Scholar
  • Collins A, Tkaczyk D, Beel J (2018) One-at-a-time: A meta-learning recommender-system for recommendation-algorithm selection on micro level. Preprint, submitted May 30, https://arxiv.org/abs/1805.12118.Google Scholar
  • Cortés-Ciriano I, Bender A (2018) Deep confidence: A computationally efficient framework for calculating reliable prediction errors for deep neural networks. J. Chemical Inform. Model. 59(3):1269–1281.CrossrefGoogle Scholar
  • Dash R, Dash PK, Bisoi R (2015) A differential harmony search based hybrid interval type2 fuzzy EGARCH model for stock market volatility prediction. Internat. J. Approximate Reasoning 59:81–104.CrossrefGoogle Scholar
  • Datta A, Tschantz MC, Datta A (2015) Automated experiments on ad privacy settings. Proc. Privacy Enhancing Tech. 2015(1):92–112.CrossrefGoogle Scholar
  • Demut IR (2010) Reliability of predictions in regression models. Kuzelovam D, Hakl F, eds. Proc XV. PhD Conf. (Institute of Computer Science, Prague, Czech Republic), 11–18.Google Scholar
  • Domingos P (2000) A unified bias-variance decomposition. Langley P, ed. Proc. 17th Internat. Conf. Machine Learn. (Morgan Kaufmann, Stanford, CA), 231–238.Google Scholar
  • Efron B (1979) Bootstrap methods: Another look at the jackknife. Ann. Statist. 7(1):1–26.CrossrefGoogle Scholar
  • Efron B (2004) The estimation of prediction error: Covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99(467):619–632.CrossrefGoogle Scholar
  • Geman S, Doursat R, Bienenstock E (1992) Neural networks and the bias variance dilemma. Neural Comput. 4:1–58.CrossrefGoogle Scholar
  • Hakala K, Van Landeghem S, Salakoski T, Van de Peer Y, Ginter F (2013) EVEX in ST’13: Application of a large-scale text mining resource to event extraction and network construction. Nédellec C, Bossy R, Kim J-D, Kim J, Ohta T, Pyysalo S, Zweigenbaum P, eds. Proc. BioNLP Shared Task 2013 Workshop (Association for Computational Linguistics, Stroudsburg, PA), 26–34.Google Scholar
  • Halpe M (1963) Confidence interval estimation in non-linear regression. J. Roy. Statist. Soc. B 25(2):330–333.Google Scholar
  • Hand DJ, Yu K (2001) Idiot’s Bayes—not so stupid after all? Internat. Statist. Rev. 69(3):385–398.Google Scholar
  • Heskes T (1997) Practical confidence and prediction intervals. Jordan MI, Petsche T, eds. NIPS’96: Proc. 9th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 176–182.Google Scholar
  • Ho SS, Wechsler H (2003) Transductive confidence machine for active learning. Proc. Internat. Joint Conf. Neural Networks (IEEE, Washington, DC), 1435–1440.Google Scholar
  • Hosanagar K. (2019) A Human’s Guide to Machine Intelligence: How Algorithms are Shaping Our Lives and How We Can Stay in Control (Viking, New York).Google Scholar
  • Huang J, Zhu L, Fan B, Chen Y, Jiang W, Li S (2018) Large-scale price interval prediction at OTA sites. IEEE Access 6:807–817.CrossrefGoogle Scholar
  • Hwang JTG, Ding AA, Hwang JTG, Ding AA (1997) Prediction intervals for artificial neural networks. J. Amer. Statist. Assoc. 92(438):748–757.CrossrefGoogle Scholar
  • Iorio A, Spencer FA, Falavigna M, Alba C, Lang E, Burnand B, Wolff R (2015) Use of GRADE for assessment of evidence about prognosis: rating confidence in estimates of event rates in broad categories of patients. BMJ 350:1–8.CrossrefGoogle Scholar
  • Johndrow JE, Lum K (2019) An algorithm for removing sensitive information: application to race-independent recidivism prediction. Ann. Appl. Stat. 13(1):189–220.CrossrefGoogle Scholar
  • Kharchenko PV, Silberstein L, Scadden DT (2014) Bayesian approach to single-cell differential expression analysis. Nature Methods 11(7):740–742.CrossrefGoogle Scholar
  • Khosravi A, Nahavandi S, Creighton D (2010) Construction of optimal prediction intervals for load forecasting problems. IEEE Trans. Power Systems 25(3):1496–1503.CrossrefGoogle Scholar
  • Knafl G, Sacks J, Ylvisaker D (1985) Confidence bands for regression functions. J. Amer. Statist. Assoc. 80(391):683–691.CrossrefGoogle Scholar
  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI’95: Proc. 14th Internat. Joint Conference Artificial Intelligence, vol. 2 (Morgan Kaufmann, San Francisco), 1137–1143.Google Scholar
  • Kukar M, Kononenko I (2002) Reliable classifications with machine learning. Elomaa T, Mannila H, Toivonen H, eds. ECML 2002: Proc. 13th Eur. Conf. Machine Learn. (Springer, Berlin), 219–231.Google Scholar
  • Lebedev AV, Westman E, Van Westen GJP, Kramberger MG, Lundervold A, Aarsland D, Vellas B (2014) Random forest ensembles for detection and prediction of Alzheimer’s disease with a good between-cohort robustness. NeuroImage Clin. 6:115–125.CrossrefGoogle Scholar
  • Liu R, Glover KP, Feasel MG, Wallqvist A (2018) General approach to estimate error bars for quantitative structure–activity relationship predictions of molecular activity. J. Chemical Inform. Model. 58(8):1561–1575.CrossrefGoogle Scholar
  • Melluish T, Saunders C, Nouretdinov I (2001) Comparing the Bayes and typicalness frameworks. De Raedt L, Flach P, eds. ECML 2001: 12th Eur. Conf. Machine Learn. (Springer, Berlin), 369—371. Google Scholar
  • Nouretdinov I, Melluish T, Vovk V (2001) Ridge regression confidence machine. Brodley CE, Pohoreckyj Danyluk A, eds. ICML’01: Proc. 18th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 385–392.Google Scholar
  • Papadopoulos G, Edwards PJ, Murray AF (2001) Confidence estimation methods for neural networks: A practical comparison. IEEE Trans. Neural Networks 12(6):1278–1287.CrossrefGoogle Scholar
  • Picard RR, Cook RD (1984) Cross-validation of regression models. J. Amer. Statist. Assoc. 79(387):575–583.CrossrefGoogle Scholar
  • Proedrou K, Ilia N, Volodya V, Alex G (2002) Transductive Confidence Machines for Pattern Recognition. Elomaa T, Mannila H, Toivonen H, eds. ECML 2002: Proc. 13th Eur. Conf. Machine Learn. (Springer, Berlin), 381–390.Google Scholar
  • Qian Y, Tan T, Hu H, Liu Q (2018) Noise robust speech recognition on Aurora4 by humans and machines. IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, New York), 5604–5608.Google Scholar
  • Rasmussen CE (2004) Gaussian processes in machine learning. Bousquet O, von Luxburg U, Rätsch G, eds. Advanced Lectures on Machine Learning (Springer, Berlin), 63–71.CrossrefGoogle Scholar
  • Saunders C, Gammerman A, Vovk V (1999) Transduction with confidence and credibility. Dean T, ed. IJCAI’99: Proc. 16th Internat. Joint Conf. Artificial Intelligence (Morgan Kaufmann, San Francisco, CA), 722–726.Google Scholar
  • Shao J (1996) Bootstrap model selection. J. Amer. Statist. Assoc. 91(434):655–665.CrossrefGoogle Scholar
  • Sheridan RP, Feuston BP, Maiorov VN, Kearsley SK (2004) Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR. J. Chemical Inform. Comput. Sci. 44(6):1912–1928.CrossrefGoogle Scholar
  • Shrestha DL, Solomatine DP (2006) Machine learning approaches for estimation of prediction interval for the model output. Neural Networks 19(2):225–235.CrossrefGoogle Scholar
  • Simoiu C, Corbett-Davies S, Goel S (2017) The problem of infra-marginality in outcome tests for discrimination. Ann. Appl. Statist. 11(3):1193–1216.Google Scholar
  • Solares E, Coello CAC, Fernandez E, Navarro J (2019) Handling uncertainty through confidence intervals in portfolio optimization. Swarm Evolutionary Comput. 44:774–787.CrossrefGoogle Scholar
  • Taylor P, Ye J (2012) On measuring and correcting the effects of data mining and model selection on measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93(441):120–131.Google Scholar
  • Tomassetti S, Wells AU, Costabel U, Cavazza A, Colby TV, Rossi G, Tantalocco P, et al. (2016) Bronchoscopic lung cryobiopsy increases diagnostic confidence in the multidisciplinary diagnosis of idiopathic pulmonary fibrosis. Amer. J. Respiratory Critical Care Medicine 193(7):745–752.CrossrefGoogle Scholar
  • Toplak M, Mocnik R, Polajnar M, Bosnić Z, Carlsson L, Hasselgren C, Stalring J (2014) Assessment of machine learning reliability methods for quantifying the applicability domain of QSAR regression models. J. Chemical Inform. Model. 54(2):431–441.CrossrefGoogle Scholar
  • Tsanas A, Little MA, McSharry PE, Ramig LO (2010) Accurate telemonitoring of Parkinsons disease progression by noninvasive speech tests. IEEE Trans. Biomedical Engrg. 57(4):884–893.CrossrefGoogle Scholar
  • Tzikas D, Kukar M, Likas A (2007) Transductive Reliability Estimation for Kernel Based Classifiers. Berthold MR, Shawe-Taylor J, Lavrač N, eds. Adv. Intelligent Data Analysis VII: Proc. 7th Internat. Sympos. Intelligent Data Analysis (Springer, Berlin), 37–47.Google Scholar
  • Walker SH, Duncan DB (1967) Estimation of the probability of an event as a function of several independent variables. Biometrika 54(1):167–179.CrossrefGoogle Scholar
  • Wonnacott TH, Wonnacott RJ (1990) Confidence intervals. Introductory Statistics, 5th ed. (Wiley, New York), 254–281.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.