Optimized Scoring Systems: Toward Trust in Machine Learning for Healthcare and Criminal Justice
Published Online:3 Oct 2018https://doi.org/10.1287/inte.2018.0957
References
- American Psychiatric Association (2013) Diagnostic and Statistical Manual of Mental Disorders (DSM-5) (American Psychiatric Association Publishing, Washington, DC).Crossref, Google Scholar
- (2017) Learning certifiably optimal rule lists for categorical data. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 35–44.Google Scholar
- (2018) Certifiably optimal rule lists for categorical data. J. Machine Learn. Res. 18:1–78.Google Scholar
- (2016) Machine bias. Accessed January 1, 2018, https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.Google Scholar
- (2000) The TIMI risk score for unstable angina/non–ST elevation MI. J. Amer. Medical Assoc. 284(7):835–842.Crossref, Google Scholar
- (2010) Kentucky pretrial risk assessment instrument validation. Bureau of Justice Statistics. (October), https://www.ncjrs.gov/App/Publications/abstract.aspx?ID=267494.Google Scholar
- (2013) Statistical procedures for forecasting criminal behavior. Criminol. Public Policy 12(3):513–544.Crossref, Google Scholar
- . (1992) American College of Chest Physicians/Society of Critical Care Medicine consensus conference: Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. Critical Care Medicine 20(6):864–874.Crossref, Google Scholar
- (2001) Statistical modeling: The two cultures. Statist. Sci. 16(3):199–231.Crossref, Google Scholar
- (1984) Classification and Regression Trees (CRC Press, Boca Raton, FL).Google Scholar
- (1928) Factors determining success or failure on parole. Bruce AA, Harno AJ, Landesco J, Burgess EW, eds. Parole and the Indeterminate Sentence: A Report to the Chairman of the Parole Board of Illinois on “The Workings of the Indeterminate Sentence Law and the Parole System in Illinois” (Committee on the Study of the Workings of the Indeterminate Sentence Law and Parole, Springfield, IL), 205–249.Google Scholar
- (2013) Is there any logic to using logit. Criminology Public Policy 12(3):563–567.Crossref, Google Scholar
- (2004) Data mining in metric space: An empirical analysis of supervised learning performance criteria. Proc. 10th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 69–78.Crossref, Google Scholar
- (2018) An optimization approach to learning falling rule lists. Storkey A, Perez-Cruz F, eds. Proc. Artificial Intelligence Statistics (AISTATS) (PMLR, Fort Lauderdale, FL), 604–612.Google Scholar
- (2008) Stop questionnaire: A tool to screen patients for obstructive sleep apnea. Anesthesiology 108(5):812–821.Crossref, Google Scholar
- (2016) (Un)fairness of risk scores in criminal sentencing. Forbes (January 13), https://www.forbes.com/sites/daniellecitron/2016/07/13/unfairness-of-risk-scores-in-criminal-sentencing/#10d06e974ad2.Google Scholar
- (2016) Big-data or slim-data: Predictive analytics will rule with world. J. Clinical Sleep Medicine 12(2):159–160.Crossref, Google Scholar
- (2017) A novel clinical score to assess seizure risk. JAMA Neurology 74(12):1395–1396.Crossref, Google Scholar
- (2011) Extraneous factors in judicial decisions. Proc. Natl. Acad. Sci. USA 108(17):6889–6892.Crossref, Google Scholar
- (1979) The robust beauty of improper linear models in decision making. Amer. Psychol. 34(7):571–582.Crossref, Google Scholar
- (2015) A Bayesian approach to learning scoring systems. Big Data 3(4):267–276.Crossref, Google Scholar
- (2018) Model class reliance: Variable importance measures for any machine learning model class, from the “Rashomon” perspective. Working paper, Cornell University, Ithaca, New York.Google Scholar
- (2014) Comprehensible classification models: A position paper. ACM SIGKDD Explorations Newsletter 15(1):1–10.Crossref, Google Scholar
- (2001) Validation of clinical classification schemes for predicting stroke. J. Amer. Medical Assoc. 285(22):2864–2870.Crossref, Google Scholar
- (2014) Box drawings for learning with imbalanced data. Proc. 20th ACM SIGKDD Conf. Knowledge Discovery Data Mining (KDD) (ACM, New York), 333–342.Crossref, Google Scholar
- (2016) European Union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine 38(3):arXiv:1606.08813 [stat.ML].Google Scholar
- (2005) The Mathematics of Risk Classification: Changing Data into Valid Instruments for Juvenile Courts (Department of Justice, Office of Juvenile Justice and Delinquency Prevention, Washington, DC).Google Scholar
- (2003) Notes on the Development of Static-2002 (Department of the Solicitor General of Canada, Ottawa, Ontario).Google Scholar
- (2017) Miscalculated score said to be behind release of alleged twin peaks killer. SFGate, San Francisco Chronicle (August 14), https://www.sfgate.com/crime/article/Miscalculated-score-said-to-be-behind-11818814.php.Google Scholar
- (1994) Twenty years of operational use of a risk prediction instrument: The United States parole commission’s salient factor score. J. Criminal Justice 22(6):477–494.Crossref, Google Scholar
- (1980) The salient factor score: A nontechnical overview. Federal Probation 44(1):44–52.Google Scholar
- (1993) Very simple classification rules perform well on most commonly used datasets. Machine Learn. 11(1):63–90.Crossref, Google Scholar
- (2009) OGRS 3: The revised offender group reconviction scale, Technical Report (Ministry of Justice, London).Google Scholar
- ILOG (2007) CPLEX 11.0 User’s Manual (IBM, New York).Google Scholar
- (1991) A new method for measuring daytime sleepiness: The Epworth sleepiness scale. Sleep 14(6):540–545.Crossref, Google Scholar
- (2013) Thinking, Fast and Slow (Farrar, Straus and Giroux, New York).Google Scholar
- (1985) Apache II: A severity of disease classification system. Critical Care Medicine 13(10):818–829.Crossref, Google Scholar
- (1981) Apache-acute physiology and chronic health evaluation: A physiologically based classification system. Critical Care Medicine 9(8):591–597.Crossref, Google Scholar
- (1991) The Apache III prognostic system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest J. 100(6):1619–1636.Crossref, Google Scholar
- (1994) The comprehensibility manifesto. KDD Nugget Newsletter (IOS Press, Amsterdam, Netherlands), 83–85.Google Scholar
- (2017) Learning cost-effective and interpretable treatment regimes. Proc. 20th Internat. Conf. Artificial Intelligence Statistics (PMLR, Fort Lauderdale, FL), 166–175.Google Scholar
- (2009) Creation and validation of the Ohio risk assessment system: Final report. Center for criminal justice research, school of criminal justice (University of Cincinnati, Cincinnati, OH), http://www.ocjs.ohio.gov/ORAS_FinalReport.pdf.Google Scholar
- (1993) A new simplified acute physiology score (SAPS II) based on a European/North American multicenter study. J. Amer. Medical Assoc. 270(24):2957–2963.Crossref, Google Scholar
- (2015) Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model. Ann. Appl. Statist. 9(3):1350–1371.Crossref, Google Scholar
- (2018) Deep learning for case-based reasoning through prototypes: A neural network that explains its predictions. Proc. 32nd AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1–8.Google Scholar
- (2005) SAPS 3-from evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission. Intensive Care Medicine 31(10):1345–1355.Crossref, Google Scholar
- Northpointe (2015) Correctional offender management profiling for alternative sanctions (COMPAS). Accessed January 1, 2018, http://www.northpointeinc.com/files/technical_documents/FieldGuide2_081412.pdf.Google Scholar
- (2000) Knowledge discovery from data? Intelligent systems and their applications. IEEE 15(2):10–12.Google Scholar
- (2017) Development of a late-life dementia prediction index with supervised machine learning in the population-based CAIDE study. J. Alzheimer’s Disease 55(3):1055–1067.Crossref, Google Scholar
- Pennsylvania Commission on Sentencing (2012) Risk/Needs Assessment Project Interim Report 4: Development of Risk Assessment Scale (Pennsylvania Commission on Sentencing, State College, PA).Google Scholar
- (2018) Big data and predictive analytics: Recalibrating expectations. J. Amer. Medical Assoc. 320(1):27–28.Crossref, Google Scholar
- (2017) Good news for screening for adult attention-deficit/hyperactivity disorder. JAMA Psychiatry 74(5):527.Crossref, Google Scholar
- (2008) Chest pain in the emergency room: Value of the heart score. Netherlands Heart J. 16(6):191–196.Crossref, Google Scholar
- (2016) Learning classification models of cognitive conditions from subtle behaviors in the digital clock drawing test. Machine Learn. 102(3):393–441.Crossref, Google Scholar
- (2017) A practical risk score for EEG seizures in hospitalized patients. JAMA Neurology 74(12):1419–1424.Crossref, Google Scholar
- . (2014) Development and validation of the emergency department assessment of chest pain score and 2 h accelerated diagnostic protocol. Emergency Medicine Australasia 26(1):34–44.Crossref, Google Scholar
- (2013) Which method predicts recidivism best? A comparison of statistical, machine learning and data mining predictive models. J. Royal Statist. Soc. Ser. A 176(2):565–584.Crossref, Google Scholar
- U.S. Department of Justice, Bureau of Justice Statistics (2014) Recidivism of prisoners released in 1994. Accessed January 1, 2018, http://doi.org/10.3886/ICPSR03355.v8.Google Scholar
- U.S. Sentencing Commission (1987) 2012 guidelines manual: Chapter four - criminal history and criminal livelihood. Accessed January 1, 2018, http://www.ussc.gov/guidelines-_manual/2012/2012-_4a11.Google Scholar
- U.S. Sentencing Commission (2004) Measuring recidivism: The criminal history computation of the federal sentencing guidelines. Accessed January 1, 2018, https://www.ussc.gov/sites/default/files/pdf/research-and-publications/research-publications/2004/200405_Recidivism_Criminal_History.pdf.Google Scholar
- (2016a) Learning optimized risk scores for large-scale datasets. arXiv:1610.00168.Google Scholar
- (2016b) Supersparse linear integer models for optimized medical scoring systems. Machine Learn. 102(3):349–391.Crossref, Google Scholar
- (2017) Optimized risk scores. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 1125–1134.Crossref, Google Scholar
- (2016) Clinical prediction models for sleep apnea: The importance of medical history over symptoms. J. Clinical Sleep Medicine 12(2):161–168.Crossref, Google Scholar
- (2017) The World Health Organization adult attention-deficit/hyperactivity disorder self-report screening scale for DSM-5. JAMA Psychiatry 74(5):520–526.Crossref, Google Scholar
- (2015) Falling rule lists. Proc. 18th Internat. Conf. Artificial Intelligence Statistics (AISTATS), May 9–12, San Diego, CA.Google Scholar
- (2016) Bayesian or’s of and’s for interpretable classification with application to context aware recommender systems. Lebanon G, Vishwanathan SVN, eds. Internat. Conf. Data Mining (ICDM) (PMLR, Fort Lauderdale, FL), arXiv:1504.07614 [cs.LG].Google Scholar
- (2017) A Bayesian framework for learning rule sets for interpretable classification. J. Machine Learn. Res. 18(70):1–37. Google Scholar
- (2013) The PTSD checklist for DSM-5 (pcl-5). National Center for PTSD, http://www.ptsd.va.gov.Google Scholar
- (2017) Can machine-learning improve cardiovascular risk prediction using routine clinical data? PLoS One. Accessed June 1, 2017, http://journals.plos.org/plosone/article/authors?id=10.1371/journal.pone.0174944.Google Scholar
- (2017a) Code of silence: How private companies hide flaws in the software that governments use to decide who goes to prison and who gets out. Washington Monthly, https://washingtonmonthly.com/magazine/junejulyaugust-2017/code-of-silence/.Google Scholar
- (2017b) When a computer program keeps you in jail: How computers are harming criminal justice. New York Times (June 13), https://www.nytimes.com/2017/06/13/opinion/how-computers-are-harming-criminal-justice.html.Google Scholar
- (1998) Integer Programming, Vol. 42 (Wiley, New York).Google Scholar
- (2017) Scalable Bayesian rule lists. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn. (ICML) (PMLR, Fort Lauderdale, FL), 3921–3930.Google Scholar
- (2002) Transforming classifier scores into accurate multiclass probability estimates. Proc. 8th ACM SIGKDD Internat. Conf. on Knowledge Discovery Data Mining (ACM, New York), 694–699.Crossref, Google Scholar
- (2017) Interpretable classification models for recidivism prediction. J. Royal Statist. Soc. Ser. A 180(3):689–722.Crossref, Google Scholar

