A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

Beau Coker
Corresponding Author
Beau Coker
[email protected]
https://orcid.org/0000-0003-3811-5674
Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115;
Search for more papers by this author
,
Cynthia Rudin
Cynthia Rudin
[email protected]
https://orcid.org/0000-0003-4283-2780
Department of Computer Science and Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708;
Search for more papers by this author
,
Gary King
Corresponding Author
Gary King
[email protected]
https://orcid.org/0000-0002-5327-7631
Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 02138
Search for more papers by this author

Beau Coker

Corresponding Author

Beau Coker

[email protected]

https://orcid.org/0000-0003-3811-5674

Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115;

Search for more papers by this author

Cynthia Rudin

[email protected]

https://orcid.org/0000-0003-4283-2780

Department of Computer Science and Department of Electrical and Computer Engineering, Duke University, Durham, North Carolina 27708;

Search for more papers by this author

Gary King

Corresponding Author

Gary King

[email protected]

https://orcid.org/0000-0002-5327-7631

Institute for Quantitative Social Science, Harvard University, Cambridge, Massachusetts 02138

Search for more papers by this author

Published Online:25 Mar 2021https://doi.org/10.1287/mnsc.2020.3818

References

Angelino E , Larus-Stone N , Alabi D , Seltzer M , Rudin C (2018) Learning certifiably optimal rule lists for categorical data. J. Machine Learn. Res. 18(234):1–78.Google Scholar
Angwin J , Larson J , Mattu S , Kirchner L (2016) Machine bias. ProPublica (May 23), https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.Google Scholar
Back BJ , Rodriguez LR , Boessenecker M , Calvo A , Castro A , Chittick HA , Eskin GC , et al. (2017) Pretrial detention reform—recommendations to the Chief Justice. Technical report, Judicial Branch of California, Sacramento.Google Scholar
Banaji MR , Greenwald AG (2013) Blindspot: Hidden Biases of Good People (Random House, New York).Google Scholar
Begley CG , Ellis LM (2012) Raise standards for preclinical cancer research. Nature 483:531–533.Crossref, Google Scholar
Belsley DA , Kuh E , Welsh RE (1980) Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , Wiley Series in Probability and Mathematical Statistics (John Wiley and Sons, Hoboken, NJ).Crossref, Google Scholar
Benjamin DJ , Berger JO , Johannesson M , Nosek BA , Wagenmakers EJ , Berk R , Bollen KA , et al. . (2018) Redefine statistical significance. Nature Human Behav. 2(1):6–10.Crossref, Google Scholar
Berger JO (1990) Robust Bayesian analysis: Sensitivity to the prior. J. Statist. Planning Inference 25:303–328.Crossref, Google Scholar
Berger JO , Moreno E , Pericchi L , Bayarri M , Bernardo J , Cano J , Horra J , et al. . (1994) An overview of robust Bayesian analysis. Test 3:5–124.Google Scholar
Berk R , Brown L , Buja A , Zhang K , Zhao L (2013) Valid post-selection inference. Ann. Statist. 41(2):802–837.Crossref, Google Scholar
Berk R , Heidari H , Jabbari S , Kearns M , Roth A (2018) Fairness in criminal justice risk assessments: The state of the art. Sociol. Methods Res. , ePub ahead of print July 2, https://doi.org/10.1177/0049124118782533.Crossref, Google Scholar
Bishop DVM , Chen J , Thompson PA (2016) Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value. PeerJ 4:e1715.Google Scholar
Bjornstad JF (1990) Predictive likelihood: A review. Statist. Sci. 5(2):242–254.Google Scholar
Breiman L (2001) Statistical modeling: The two cultures. Statist. Sci. 16(3):199–215.Google Scholar
Camerer CF , Dreber A , Forsell E , Ho TH , Huber J , Johannesson M , Kirchler M , et al. . (2016) Evaluating replicability of laboratory experiments in economics. Science 351(6280):1433–1436.Google Scholar
Cornfield J , Haenszel W , Hammond EC (1959) Smoking and lung cancer: Recent evidence and a discussion of some questions. J. National Cancer Inst. 22:173–203.Google Scholar
Dieterich W , Mendoza C , Brennan T (2016) COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Technical report, Northpointe, Traverse City, MI.Google Scholar
Ding P , VanderWeele TJ (2016) Sensitivity analysis without assumptions. Epidemiology 27(3):368–377.Crossref, Google Scholar
Dwork C , Feldman V , Hardt M , Pitassi T , Reingold O , Roth AL (2015) Preserving statistical validity in adaptive data analysis. Proc. Forty-Seventh Annual ACM Sympos. Theory Comput. (STOC) , Portland, Oregon, 117–126.Google Scholar
Fisher A , Rudin C , Dominici F (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Machine Learn. Res. 20(177):1–81.Google Scholar
Gelman A , Loken E (2013) The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Accessed April 23, 2018, http://www.stat.columbia.edu/∼gelman/research/unpublished/p_hacking.pdf.Google Scholar
Gelman A , Hill J , Yajima M (2012) Why we (usually) don’t have to worry about multiple comparisons. J. Res. Educational Effectiveness 5:189–211.Crossref, Google Scholar
Ghanem R , Higdon D , Owhadi H (2017) Handbook of Uncertainty Quantification (Springer International Publishing, Cham, Switzerland).Crossref, Google Scholar
Gilbert DT (1998) Ordinary psychology. Gilbert DT , Fiske ST , Lindzey G , eds. The Handbook of Social Psychology , vol. 2 (McGraw Hill, New York), 89–150.Google Scholar
Gilbert DT , King G , Pettigrew S , Wilson TD (2016) Comment on “estimating the reproducibility of psychological science.” Science 351(6277):1037.Google Scholar
Hannig J , Iyer H , Lai RC , Lee TCM (2016) Generalized fiducial inference: A review and new results. J. Amer. Statist. Assoc. 111(515):1346–1361.Crossref, Google Scholar
Head ML , Holman L , Lanfear R , Kahn AT , Jennions MD (2015) The extent and consequences of p-hacking in science. PLoS Biology 13(3):e1002106.Crossref, Google Scholar
Humphreys M , Sanchez De La Sierra R , Van Der Windt P (2012) Fishing, commitment, and communication: A proposal for comprehensive nonbinding research registration. Political Anal. 21(1):1–20.Google Scholar
Iacus SM , King G , Porro G (2011) Multivariate matching methods that are monotonic imbalance bounding. J. Amer. Statist. Assoc. 106:345–361.Crossref, Google Scholar
Jager LR , Leek JT (2014) An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics 15(1):1–12.Crossref, Google Scholar
Kahneman D (2011) Thinking, Fast and Slow (Farrar, Straus and Giroux, New York).Google Scholar
King G (1995) Replication, replication. Political Sci. Politics 28(3):443–499.Crossref, Google Scholar
King G , Zeng L (2006) The dangers of extreme counterfactuals. Political Anal. 14(2):131–159.Crossref, Google Scholar
Leamer EE (1983) Let’s take the con out of econometrics. Amer. Econom. Rev. 73(1):31–43.Google Scholar
Leamer EE (2010) Extreme bounds analysis. Durlauf SN , Blume LE , eds. Microeconometrics , The New Palgrave Economics Collection (Palgrave Macmillan, London), 49–52.Google Scholar
Letham B , Letham PA , Rudin C , Browne E (2016) Prediction uncertainty and optimal experimental design for learning dynamical systems. Chaos 26(6).Crossref, Google Scholar
Lin DY , Psaty BM , Kronmal RA (1998) Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3):948–963.Crossref, Google Scholar
Liu W , Kuramoto SJ , Stuart EA (2013) An introduction to sensitivity analysis for unobserved confounding in non-experimental prevention research. Prevention Sci . 14(6):570–580.Google Scholar
Monogan JE (2015) Research preregistration in political science: The case, counterarguments, and a response to critiques. Political Sci. Politics 48(3):425–429.Crossref, Google Scholar
Morucci M , Noor-E-Alam M , Rudin C (2018) Hypothesis tests that are robust to choice of matching method. Preprint, submitted December 5, https://arxiv.org/abs/1812.02227.Google Scholar
Muñoz J , Young C (2018) We ran 9 billion regressions: Eliminating false positives through computational model robustness. Sociol. Methods Res. 48(1):1–33.Google Scholar
Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251).Crossref, Google Scholar
Prinz F , Schlange T , Asadullah K (2011) Believe it or not: How much can we rely on published data on potential drug targets? Nature Rev. Drug Discovery 10:712.Crossref, Google Scholar
Rudin C , Wang C , Coker B (2020) The age of secrecy and unfairness in recidivism prediction. Harvard Data Sci. Rev. 2(1). https://doi.org/10.1162/99608f92.6ed64b30.Google Scholar
Scheffé H (1959) The Analysis of Variance (Wiley, New York).Google Scholar
Simmons JP , Nelson LD , Simonsohn U (2011) False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psych. Methods 22(11):1359–1366.Google Scholar
Simonsohn U , Nelson LD , Simmons JP (2014) P-curve: A key to the file-drawer. J. Experiment. Psych. General 143(2):534–547.Crossref, Google Scholar
Simonsohn U , Simmons JP , Nelson LD (2015) Specification curve: Descriptive and inferential statistics on all reasonable specifications. Preprint, submitted November 25, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2694998.Google Scholar
Sudret B , Marelli S , Wiart J (2017) Surrogate models for uncertainty quantification: An overview. Sibille A (chair), 2017 11th Eur. Conf. Antennas Propagation (EUCAP) (IEEE, Piscataway, NJ), 793–797.Google Scholar
Tibshirani RJ , Taylor J , Lockhart R , Tibshirani R (2016) Exact post-selection inference for sequential regression procedures. J. Amer. Statist. Assoc. 111(514):600–620.Crossref, Google Scholar
Trafimow D , Marks M (2015) Editorial. Basic Appl. Soc. Psych. 37:1–2.Crossref, Google Scholar
Tulabandhula T , Rudin C (2013) Machine learning with operational costs. J. Machine Learn. Res. 14:1989–2028.Google Scholar
Tulabandhula T , Rudin C (2014a) On combining machine learning with decision making. Machine Learn. 97(1–2):33–64.Google Scholar
Tulabandhula T , Rudin C (2014b) Robust optimization using machine learning for uncertainty sets. Preprint, submitted July 4, https://arxiv.org/abs/1407.1097.Google Scholar
Vanderweele TJ , Arah OA (2011) Unmeasured confounding for general outcomes, treatments, and confounders: Bias formulas for sensitivity analysis. Epidemiology 22(1):42–52.Crossref, Google Scholar
Wasserstein RL , Lazar NA (2016) The ASA’s statement on p-values: Context, process, and purpose. Amer. Statist. 70(2):129–133.Crossref, Google Scholar
Wicherts JM , Veldkamp CLS , Augusteijn HEM , Bakker M , VanAert RCM , vanAssen MALM (2016) Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Front Psychol. Nov 25;7:1832. doi:10.3389/fpsyg.2016.01832. eCollection 2016.Google Scholar
Young C , Holsteen K (2015) Model uncertainty and robustness: A computational framework for multimodel analysis. Sociological Methods & Research, ePub ahead of print October 23, https://doi.org/10.1177/0049124115610347.Google Scholar
Zeng J , Ustun B , Rudin C (2017) Interpretable classification models for recidivism prediction. J. Royal Statist. Soc. 180(3):689–722.Google Scholar

Volume 67, Issue 10

October 2021

Pages 5969-6627, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:April 22, 2018
Accepted:July 06, 2020
Published Online:March 25, 2021

Cite as

Beau Coker , Cynthia Rudin , Gary King (2021) A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results. Management Science 67(10):6174-6197.

https://doi.org/10.1287/mnsc.2020.3818

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results

References

Volume 67, Issue 10

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News