Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice

Published Online:https://doi.org/10.1287/inte.2018.0957

References

  • American Psychiatric Association (2013) Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (American Psychiatric Association Publishing, Washington, DC).CrossrefGoogle Scholar
  • Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2017) Learning certifiably optimal rule lists for categorical data. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 35–44.Google Scholar
  • Angelino E, Larus-Stone N, Alabi D, Seltzer M, Rudin C (2018) Certifiably optimal rule lists for categorical data. J. Machine Learn. Res. 18:1–78.Google Scholar
  • Angwin J, Larson J, Mattu S, Kirchner L (2016) Machine bias. Accessed January 1, 2018, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.Google Scholar
  • Antman EM, Cohen M, Bernink PJ, McCabe CH, Horacek T, Papuchis G, Mautner B, Corbalan R, Radley D, Braunwald E (2000) The TIMI risk score for unstable angina/non–ST elevation MI. J. Amer. Medical Assoc. 284(7):835–842.CrossrefGoogle Scholar
  • Austin J, Ocker R, Bhati A (2010) Kentucky pretrial risk assessment instrument validation. Bureau of Justice Statistics. (October), https://www.ncjrs.gov/App/Publications/abstract.aspx?ID=267494.Google Scholar
  • Berk RA, Bleich J (2013) Statistical procedures for forecasting criminal behavior. Criminol. Public Policy 12(3):513–544.CrossrefGoogle Scholar
  • Bone R, Balk R, Cerra F, Dellinger R, Fein A, Knaus W, Schein R, Sibbald W, Abrams J, Bernard G, et al.. (1992) American College of Chest Physicians/Society of Critical Care Medicine consensus conference: Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. Critical Care Medicine 20(6):864–874.CrossrefGoogle Scholar
  • Breiman L (2001) Statistical modeling: The two cultures. Statist. Sci. 16(3):199–231.CrossrefGoogle Scholar
  • Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees (CRC Press, Boca Raton, FL).Google Scholar
  • Burgess EW (1928) Factors determining success or failure on parole. Bruce AA, Harno AJ, Landesco J, Burgess EW, eds. Parole and the Indeterminate Sentence: A Report to the Chairman of the Parole Board of Illinois on “The Workings of the Indeterminate Sentence Law and the Parole System in Illinois” (Committee on the Study of the Workings of the Indeterminate Sentence Law and Parole, Springfield, IL), 205–249.Google Scholar
  • Bushway SD (2013) Is there any logic to using logit. Criminology Public Policy 12(3):563–567.CrossrefGoogle Scholar
  • Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: An empirical analysis of supervised learning performance criteria. Proc. 10th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 69–78.CrossrefGoogle Scholar
  • Chen C, Rudin C (2018) An optimization approach to learning falling rule lists. Storkey A, Perez-Cruz F, eds. Proc. Artificial Intelligence Statistics (AISTATS) (PMLR, Fort Lauderdale, FL), 604–612.Google Scholar
  • Chung F, Yegneswaran B, Liao P, Chung SA, Vairavanathan S, Islam S, Khajehdehi A, Shapiro CM (2008) Stop questionnaire: A tool to screen patients for obstructive sleep apnea. Anesthesiology 108(5):812–821.CrossrefGoogle Scholar
  • Citron D (2016) (Un)fairness of risk scores in criminal sentencing. Forbes (January 13), https://www.forbes.com/sites/daniellecitron/2016/07/13/unfairness-of-risk-scores-in-criminal-sentencing/#10d06e974ad2.Google Scholar
  • Combs D, Shetty S, Parthasarathy S (2016) Big-data or slim-data: Predictive analytics will rule with world. J. Clinical Sleep Medicine 12(2):159–160.CrossrefGoogle Scholar
  • Czeisler BM, Claassen J (2017) A novel clinical score to assess seizure risk. JAMA Neurology 74(12):1395–1396.CrossrefGoogle Scholar
  • Danziger S, Levav J, Avnaim-Pesso L (2011) Extraneous factors in judicial decisions. Proc. Natl. Acad. Sci. USA 108(17):6889–6892.CrossrefGoogle Scholar
  • Dawes RM (1979) The robust beauty of improper linear models in decision making. Amer. Psychol. 34(7):571–582.CrossrefGoogle Scholar
  • Ertekin Ş, Rudin C (2015) A Bayesian approach to learning scoring systems. Big Data 3(4):267–276.CrossrefGoogle Scholar
  • Fisher A, Rudin C, Dominici F (2018) Model class reliance: Variable importance measures for any machine learning model class, from the “Rashomon” perspective. Working paper, Cornell University, Ithaca, New York.Google Scholar
  • Freitas AA (2014) Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter 15(1):1–10.CrossrefGoogle Scholar
  • Gage BF, Waterman AD, Shannon W, Boechler M, Rich MW, Radford MJ (2001) Validation of clinical classification schemes for predicting stroke. J. Amer. Medical Assoc. 285(22):2864–2870.CrossrefGoogle Scholar
  • Goh ST, Rudin C (2014) Box drawings for learning with imbalanced data. Proc. 20th ACM SIGKDD Conf. Knowledge Discovery Data Mining (KDD) (ACM, New York), 333–342.CrossrefGoogle Scholar
  • Goodman B, Flaxman S (2016) European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine 38(3):arXiv:1606.08813 [stat.ML].Google Scholar
  • Gottfredson DM, Snyder HN (2005) The Mathematics of Risk Classification: Changing Data into Valid Instruments for Juvenile Courts (Department of Justice, Office of Juvenile Justice and Delinquency Prevention, Washington, DC).Google Scholar
  • Hanson R, Thornton D (2003) Notes on the Development of Static-2002 (Department of the Solicitor General of Canada, Ottawa, Ontario).Google Scholar
  • Ho V (2017) Miscalculated score said to be behind release of alleged twin peaks killer. SFGate, San Francisco Chronicle (August 14), https://www.sfgate.com/crime/article/Miscalculated-score-said-to-be-behind-11818814.php.Google Scholar
  • Hoffman PB (1994) Twenty years of operational use of a risk prediction instrument: The United States parole commission’s salient factor score. J. Criminal Justice 22(6):477–494.CrossrefGoogle Scholar
  • Hoffman PB, Adelberg S (1980) The salient factor score: A nontechnical overview. Federal Probation 44(1):44–52.Google Scholar
  • Holte RC (1993) Very simple classification rules perform well on most commonly used datasets. Machine Learn. 11(1):63–90.CrossrefGoogle Scholar
  • Howard P, Francis B, Soothill K, Humphreys L (2009) OGRS 3: The revised offender group reconviction scale, Technical Report (Ministry of Justice, London).Google Scholar
  • ILOG (2007) CPLEX 11.0 User’s Manual (IBM, New York).Google Scholar
  • Johns MW (1991) A new method for measuring daytime sleepiness: The Epworth sleepiness scale. Sleep 14(6):540–545.CrossrefGoogle Scholar
  • Kahneman D (2013) Thinking, Fast and Slow (Farrar, Straus and Giroux, New York).Google Scholar
  • Knaus WA, Draper EA, Wagner DP, Zimmerman JE (1985) Apache II: A severity of disease classification system. Critical Care Medicine 13(10):818–829.CrossrefGoogle Scholar
  • Knaus WA, Zimmerman JE, Wagner DP, Draper EA, Lawrence DE (1981) Apache-acute physiology and chronic health evaluation: A physiologically based classification system. Critical Care Medicine 9(8):591–597.CrossrefGoogle Scholar
  • Knaus WA, Wagner D, Draper E, Zimmerman J, Bergner M, Bastos P, Sirio C, Murphy D, Lotring T, Damiano A (1991) The Apache III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest J. 100(6):1619–1636.CrossrefGoogle Scholar
  • Kodratoff Y (1994) The comprehensibility manifesto. KDD Nugget Newsletter (IOS Press, Amsterdam, Netherlands), 83–85.Google Scholar
  • Lakkaraju H, Rudin C (2017) Learning cost-effective and interpretable treatment regimes. Proc. 20th Internat. Conf. Artificial Intelligence Statistics (PMLR, Fort Lauderdale, FL), 166–175.Google Scholar
  • Latessa E, Smith P, Lemke R, Makarios M, Lowenkamp C (2009) Creation and validation of the Ohio risk assessment system: Final report. Center for criminal justice research, school of criminal justice (University of Cincinnati, Cincinnati, OH), http://www.ocjs.ohio.gov/ORAS_FinalReport.pdf.Google Scholar
  • Le Gall JR, Lemeshow S, Saulnier F (1993) A new simplified acute physiology score (SAPS II) based on a European/North American multicenter study. J. Amer. Medical Assoc. 270(24):2957–2963.CrossrefGoogle Scholar
  • Letham B, Rudin C, McCormick TH, Madigan D (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Statist. 9(3):1350–1371.CrossrefGoogle Scholar
  • Li O, Liu H, Chen C, Rudin C (2018) Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. Proc. 32nd AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1–8.Google Scholar
  • Moreno RP, Metnitz PG, Almeida E, Jordan B, Bauer P, Campos RA, Iapichino G, Edbrooke D, Capuzzo M, Le Gall JR (2005) SAPS 3-from evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission. Intensive Care Medicine 31(10):1345–1355.CrossrefGoogle Scholar
  • Northpointe (2015) Correctional offender management profiling for alternative sanctions (COMPAS). Accessed January 1, 2018, http://www.northpointeinc.com/files/technical_documents/FieldGuide2_081412.pdf.Google Scholar
  • Pazzani MJ (2000) Knowledge discovery from data? Intelligent systems and their applications. IEEE 15(2):10–12.Google Scholar
  • Pekkala T, Hall A, Lötjönen J, Mattila J, Soininen H, Ngandu T, Laatikainen T, Kivipelto M, Solomon A (2017) Development of a late-life dementia prediction index with supervised machine learning in the population-based CAIDE study. J. Alzheimer’s Disease 55(3):1055–1067.CrossrefGoogle Scholar
  • Pennsylvania Commission on Sentencing (2012) Risk/Needs Assessment Project Interim Report 4: Development of Risk Assessment Scale (Pennsylvania Commission on Sentencing, State College, PA).Google Scholar
  • Shah N, Steyerberg E, Kent D (2018) Big data and predictive analytics: Recalibrating expectations. J. Amer. Medical Assoc. 320(1):27–28.CrossrefGoogle Scholar
  • Shaw P, Ahn K, Rapoport JL (2017) Good news for screening for adult attention-deficit/hyperactivity disorder. JAMA Psychiatry 74(5):527.CrossrefGoogle Scholar
  • Six A, Backus B, Kelder J (2008) Chest pain in the emergency room: Value of the heart score. Netherlands Heart J. 16(6):191–196.CrossrefGoogle Scholar
  • Souillard-Mandar W, Davis R, Rudin C, Au R, Libon DJ, Swenson R, Price CC, Lamar M, Penney DL (2016) Learning classification models of cognitive conditions from subtle behaviors in the digital clock drawing test. Machine Learn. 102(3):393–441.CrossrefGoogle Scholar
  • Struck AF, Ustun B, Rodriguez Ruiz A, Lee JW, LaRoche S, Hirsch LJ, Gilmore EJ, Rudin C, Westover BM (2017) A practical risk score for EEG seizures in hospitalized patients. JAMA Neurology 74(12):1419–1424.CrossrefGoogle Scholar
  • Than M, Flaws D, Sanders S, Doust J, Glasziou P, Kline J, Aldous S, Troughton R, Reid C, Parsonage WA, et al.. (2014) Development and validation of the emergency department assessment of chest pain score and 2 h accelerated diagnostic protocol. Emergency Medicine Australasia 26(1):34–44.CrossrefGoogle Scholar
  • Tollenaar N, van der Heijden P (2013) Which method predicts recidivism best? A comparison of statistical, machine learning and data mining predictive models. J. Royal Statist. Soc. Ser. A 176(2):565–584.CrossrefGoogle Scholar
  • U.S. Department of Justice, Bureau of Justice Statistics (2014) Recidivism of prisoners released in 1994. Accessed January 1, 2018, http://doi.org/10.3886/ICPSR03355.v8.Google Scholar
  • U.S. Sentencing Commission (1987) 2012 guidelines manual: Chapter four - criminal history and criminal livelihood. Accessed January 1, 2018, http://www.ussc.gov/guidelines-_manual/2012/2012-_4a11.Google Scholar
  • U.S. Sentencing Commission (2004) Measuring recidivism: The criminal history computation of the federal sentencing guidelines. Accessed January 1, 2018, https://www.ussc.gov/sites/default/files/pdf/research-and-publications/research-publications/2004/200405_Recidivism_Criminal_History.pdf.Google Scholar
  • Ustun B, Rudin C (2016a) Learning optimized risk scores for large-scale datasets. arXiv:1610.00168.Google Scholar
  • Ustun B, Rudin C (2016b) Supersparse linear integer models for optimized medical scoring systems. Machine Learn. 102(3):349–391.CrossrefGoogle Scholar
  • Ustun B, Rudin C (2017) Optimized risk scores. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1125–1134.CrossrefGoogle Scholar
  • Ustun B, Westover MB, Rudin C, Bianchi MT (2016) Clinical prediction models for sleep apnea: The importance of medical history over symptoms. J. Clinical Sleep Medicine 12(2):161–168.CrossrefGoogle Scholar
  • Ustun B, Adler LA, Rudin C, Faraone SV, Spencer TJ, Berglund P, Gruber MJ, Kessler RC (2017) The World Health Organization adult attention-deficit/hyperactivity disorder self-report screening scale for DSM-5. JAMA Psychiatry 74(5):520–526.CrossrefGoogle Scholar
  • Wang F, Rudin C (2015) Falling rule lists. Proc. 18th Internat. Conf. Artificial Intelligence Statistics (AISTATS), May 9–12, San Diego, CA.Google Scholar
  • Wang T, Rudin C, Doshi F, Liu Y, Klampfl E, MacNeille P (2016) Bayesian or’s of and’s for interpretable classification with application to context aware recommender systems. Lebanon G, Vishwanathan SVN, eds. Internat. Conf. Data Mining (ICDM) (PMLR, Fort Lauderdale, FL), arXiv:1504.07614 [cs.LG].Google Scholar
  • Wang T, Rudin C, Doshi-Velez F, Liu Y, Klampfl E, MacNeille P (2017) A Bayesian framework for learning rule sets for interpretable classification. J. Machine Learn. Res. 18(70):1–37. Google Scholar
  • Weathers FW, Litz BT, Keane TM, Palmieri PA, Marx BP, Schnurr PP (2013) The PTSD checklist for DSM-5 (pcl-5). National Center for PTSD, http://www.ptsd.va.gov.Google Scholar
  • Weng SF, Reps J, Kai J, Garibaldi JM, Qureshi N (2017) Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. Accessed June 1, 2017, http://journals.plos.org/plosone/article/authors?id=10.1371/journal.pone.0174944.Google Scholar
  • Wexler R (2017a) Code of silence: How private companies hide flaws in the software that governments use to decide who goes to prison and who gets out. Washington Monthly, https://washingtonmonthly.com/magazine/junejulyaugust-2017/code-of-silence/.Google Scholar
  • Wexler R (2017b) When a computer program keeps you in jail: How computers are harming criminal justice. New York Times (June 13), https://www.nytimes.com/2017/06/13/opinion/how-computers-are-harming-criminal-justice.html.Google Scholar
  • Wolsey LA (1998) Integer Programming, Vol. 42 (Wiley, New York).Google Scholar
  • Yang H, Rudin C, Seltzer M (2017) Scalable Bayesian rule lists. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn. (ICML) (PMLR, Fort Lauderdale, FL), 3921–3930.Google Scholar
  • Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. Proc. 8th ACM SIGKDD Internat. Conf. on Knowledge Discovery Data Mining (ACM, New York), 694–699.CrossrefGoogle Scholar
  • Zeng J, Ustun B, Rudin C (2017) Interpretable classification models for recidivism prediction. J. Royal Statist. Soc. Ser. A 180(3):689–722.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.