A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

Published Online:https://doi.org/10.1287/mnsc.2020.3818

References

  • Angelino E , Larus-Stone N , Alabi D , Seltzer M , Rudin C (2018) Learning certifiably optimal rule lists for categorical data. J. Machine Learn. Res. 18(234):1–78.Google Scholar
  • Angwin J , Larson J , Mattu S , Kirchner L (2016) Machine bias. ProPublica (May 23), https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.Google Scholar
  • Back BJ , Rodriguez LR , Boessenecker M , Calvo A , Castro A , Chittick HA , Eskin GC , et al. (2017) Pretrial detention reform—recommendations to the Chief Justice. Technical report, Judicial Branch of California, Sacramento.Google Scholar
  • Banaji MR , Greenwald AG (2013) Blindspot: Hidden Biases of Good People (Random House, New York).Google Scholar
  • Begley CG , Ellis LM (2012) Raise standards for preclinical cancer research. Nature 483:531–533.CrossrefGoogle Scholar
  • Belsley DA , Kuh E , Welsh RE (1980) Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , Wiley Series in Probability and Mathematical Statistics (John Wiley and Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Benjamin DJ , Berger JO , Johannesson M , Nosek BA , Wagenmakers EJ , Berk R , Bollen KA , et al. . (2018) Redefine statistical significance. Nature Human Behav. 2(1):6–10.CrossrefGoogle Scholar
  • Berger JO (1990) Robust Bayesian analysis: Sensitivity to the prior. J. Statist. Planning Inference 25:303–328.CrossrefGoogle Scholar
  • Berger JO , Moreno E , Pericchi L , Bayarri M , Bernardo J , Cano J , Horra J , et al. . (1994) An overview of robust Bayesian analysis. Test 3:5–124.Google Scholar
  • Berk R , Brown L , Buja A , Zhang K , Zhao L (2013) Valid post-selection inference. Ann. Statist. 41(2):802–837.CrossrefGoogle Scholar
  • Berk R , Heidari H , Jabbari S , Kearns M , Roth A (2018) Fairness in criminal justice risk assessments: The state of the art. Sociol. Methods Res. , ePub ahead of print July 2, https://doi.org/10.1177/0049124118782533.CrossrefGoogle Scholar
  • Bishop DVM , Chen J , Thompson PA (2016) Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value. PeerJ 4:e1715.Google Scholar
  • Bjornstad JF (1990) Predictive likelihood: A review. Statist. Sci. 5(2):242–254.Google Scholar
  • Breiman L (2001) Statistical modeling: The two cultures. Statist. Sci. 16(3):199–215.Google Scholar
  • Camerer CF , Dreber A , Forsell E , Ho TH , Huber J , Johannesson M , Kirchler M , et al. . (2016) Evaluating replicability of laboratory experiments in economics. Science 351(6280):1433–1436.Google Scholar
  • Cornfield J , Haenszel W , Hammond EC (1959) Smoking and lung cancer: Recent evidence and a discussion of some questions. J. National Cancer Inst. 22:173–203.Google Scholar
  • Dieterich W , Mendoza C , Brennan T (2016) COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Technical report, Northpointe, Traverse City, MI.Google Scholar
  • Ding P , VanderWeele TJ (2016) Sensitivity analysis without assumptions. Epidemiology 27(3):368–377.CrossrefGoogle Scholar
  • Dwork C , Feldman V , Hardt M , Pitassi T , Reingold O , Roth AL (2015) Preserving statistical validity in adaptive data analysis. Proc. Forty-Seventh Annual ACM Sympos. Theory Comput. (STOC) , Portland, Oregon, 117–126.Google Scholar
  • Fisher A , Rudin C , Dominici F (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Machine Learn. Res. 20(177):1–81.Google Scholar
  • Gelman A , Loken E (2013) The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Accessed April 23, 2018, http://www.stat.columbia.edu/∼gelman/research/unpublished/p_hacking.pdf.Google Scholar
  • Gelman A , Hill J , Yajima M (2012) Why we (usually) don’t have to worry about multiple comparisons. J. Res. Educational Effectiveness 5:189–211.CrossrefGoogle Scholar
  • Ghanem R , Higdon D , Owhadi H (2017) Handbook of Uncertainty Quantification (Springer International Publishing, Cham, Switzerland).CrossrefGoogle Scholar
  • Gilbert DT (1998) Ordinary psychology. Gilbert DT , Fiske ST , Lindzey G , eds. The Handbook of Social Psychology , vol. 2 (McGraw Hill, New York), 89–150.Google Scholar
  • Gilbert DT , King G , Pettigrew S , Wilson TD (2016) Comment on “estimating the reproducibility of psychological science.” Science 351(6277):1037.Google Scholar
  • Hannig J , Iyer H , Lai RC , Lee TCM (2016) Generalized fiducial inference: A review and new results. J. Amer. Statist. Assoc. 111(515):1346–1361.CrossrefGoogle Scholar
  • Head ML , Holman L , Lanfear R , Kahn AT , Jennions MD (2015) The extent and consequences of p-hacking in science. PLoS Biology 13(3):e1002106.CrossrefGoogle Scholar
  • Humphreys M , Sanchez De La Sierra R , Van Der Windt P (2012) Fishing, commitment, and communication: A proposal for comprehensive nonbinding research registration. Political Anal. 21(1):1–20.Google Scholar
  • Iacus SM , King G , Porro G (2011) Multivariate matching methods that are monotonic imbalance bounding. J. Amer. Statist. Assoc. 106:345–361.CrossrefGoogle Scholar
  • Jager LR , Leek JT (2014) An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics 15(1):1–12.CrossrefGoogle Scholar
  • Kahneman D (2011) Thinking, Fast and Slow (Farrar, Straus and Giroux, New York).Google Scholar
  • King G (1995) Replication, replication. Political Sci. Politics 28(3):443–499.CrossrefGoogle Scholar
  • King G , Zeng L (2006) The dangers of extreme counterfactuals. Political Anal. 14(2):131–159.CrossrefGoogle Scholar
  • Leamer EE (1983) Let’s take the con out of econometrics. Amer. Econom. Rev. 73(1):31–43.Google Scholar
  • Leamer EE (2010) Extreme bounds analysis. Durlauf SN , Blume LE , eds. Microeconometrics , The New Palgrave Economics Collection (Palgrave Macmillan, London), 49–52.Google Scholar
  • Letham B , Letham PA , Rudin C , Browne E (2016) Prediction uncertainty and optimal experimental design for learning dynamical systems. Chaos 26(6).CrossrefGoogle Scholar
  • Lin DY , Psaty BM , Kronmal RA (1998) Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3):948–963.CrossrefGoogle Scholar
  • Liu W , Kuramoto SJ , Stuart EA (2013) An introduction to sensitivity analysis for unobserved confounding in non-experimental prevention research. Prevention Sci . 14(6):570–580.Google Scholar
  • Monogan JE (2015) Research preregistration in political science: The case, counterarguments, and a response to critiques. Political Sci. Politics 48(3):425–429.CrossrefGoogle Scholar
  • Morucci M , Noor-E-Alam M , Rudin C (2018) Hypothesis tests that are robust to choice of matching method. Preprint, submitted December 5, https://arxiv.org/abs/1812.02227.Google Scholar
  • Muñoz J , Young C (2018) We ran 9 billion regressions: Eliminating false positives through computational model robustness. Sociol. Methods Res. 48(1):1–33.Google Scholar
  • Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251).CrossrefGoogle Scholar
  • Prinz F , Schlange T , Asadullah K (2011) Believe it or not: How much can we rely on published data on potential drug targets? Nature Rev. Drug Discovery 10:712.CrossrefGoogle Scholar
  • Rudin C , Wang C , Coker B (2020) The age of secrecy and unfairness in recidivism prediction. Harvard Data Sci. Rev. 2(1). https://doi.org/10.1162/99608f92.6ed64b30.Google Scholar
  • Scheffé H (1959) The Analysis of Variance (Wiley, New York).Google Scholar
  • Simmons JP , Nelson LD , Simonsohn U (2011) False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psych. Methods 22(11):1359–1366.Google Scholar
  • Simonsohn U , Nelson LD , Simmons JP (2014) P-curve: A key to the file-drawer. J. Experiment. Psych. General 143(2):534–547.CrossrefGoogle Scholar
  • Simonsohn U , Simmons JP , Nelson LD (2015) Specification curve: Descriptive and inferential statistics on all reasonable specifications. Preprint, submitted November 25, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2694998.Google Scholar
  • Sudret B , Marelli S , Wiart J (2017) Surrogate models for uncertainty quantification: An overview. Sibille A (chair), 2017 11th Eur. Conf. Antennas Propagation (EUCAP) (IEEE, Piscataway, NJ), 793–797.Google Scholar
  • Tibshirani RJ , Taylor J , Lockhart R , Tibshirani R (2016) Exact post-selection inference for sequential regression procedures. J. Amer. Statist. Assoc. 111(514):600–620.CrossrefGoogle Scholar
  • Trafimow D , Marks M (2015) Editorial. Basic Appl. Soc. Psych. 37:1–2.CrossrefGoogle Scholar
  • Tulabandhula T , Rudin C (2013) Machine learning with operational costs. J. Machine Learn. Res. 14:1989–2028.Google Scholar
  • Tulabandhula T , Rudin C (2014a) On combining machine learning with decision making. Machine Learn. 97(1–2):33–64.Google Scholar
  • Tulabandhula T , Rudin C (2014b) Robust optimization using machine learning for uncertainty sets. Preprint, submitted July 4, https://arxiv.org/abs/1407.1097.Google Scholar
  • Vanderweele TJ , Arah OA (2011) Unmeasured confounding for general outcomes, treatments, and confounders: Bias formulas for sensitivity analysis. Epidemiology 22(1):42–52.CrossrefGoogle Scholar
  • Wasserstein RL , Lazar NA (2016) The ASA’s statement on p-values: Context, process, and purpose. Amer. Statist. 70(2):129–133.CrossrefGoogle Scholar
  • Wicherts JM , Veldkamp CLS , Augusteijn HEM , Bakker M , VanAert RCM , vanAssen MALM (2016) Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Front Psychol. Nov 25;7:1832. doi:10.3389/fpsyg.2016.01832. eCollection 2016.Google Scholar
  • Young C , Holsteen K (2015) Model uncertainty and robustness: A computational framework for multimodel analysis. Sociological Methods & Research, ePub ahead of print October 23, https://doi.org/10.1177/0049124115610347.Google Scholar
  • Zeng J , Ustun B , Rudin C (2017) Interpretable classification models for recidivism prediction. J. Royal Statist. Soc. 180(3):689–722.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.