A Theory of Statistical Inference for Ensuring the Robustness of Scientific Results
Published Online:25 Mar 2021https://doi.org/10.1287/mnsc.2020.3818
References
- (2018) Learning certifiably optimal rule lists for categorical data. J. Machine Learn. Res. 18(234):1–78.Google Scholar
- (2016) Machine bias. ProPublica (May 23), https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing.Google Scholar
- (2017) Pretrial detention reform—recommendations to the Chief Justice. Technical report, Judicial Branch of California, Sacramento.Google Scholar
- (2013) Blindspot: Hidden Biases of Good People (Random House, New York).Google Scholar
- (2012) Raise standards for preclinical cancer research. Nature 483:531–533.Crossref, Google Scholar
- (1980) Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , Wiley Series in Probability and Mathematical Statistics (John Wiley and Sons, Hoboken, NJ).Crossref, Google Scholar
- . (2018) Redefine statistical significance. Nature Human Behav. 2(1):6–10.Crossref, Google Scholar
- (1990) Robust Bayesian analysis: Sensitivity to the prior. J. Statist. Planning Inference 25:303–328.Crossref, Google Scholar
- . (1994) An overview of robust Bayesian analysis. Test 3:5–124.Google Scholar
- (2013) Valid post-selection inference. Ann. Statist. 41(2):802–837.Crossref, Google Scholar
- (2018) Fairness in criminal justice risk assessments: The state of the art. Sociol. Methods Res. , ePub ahead of print July 2, https://doi.org/10.1177/0049124118782533.Crossref, Google Scholar
- (2016) Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value. PeerJ 4:e1715.Google Scholar
- (1990) Predictive likelihood: A review. Statist. Sci. 5(2):242–254.Google Scholar
- (2001) Statistical modeling: The two cultures. Statist. Sci. 16(3):199–215.Google Scholar
- . (2016) Evaluating replicability of laboratory experiments in economics. Science 351(6280):1433–1436.Google Scholar
- (1959) Smoking and lung cancer: Recent evidence and a discussion of some questions. J. National Cancer Inst. 22:173–203.Google Scholar
- (2016) COMPAS risk scales: Demonstrating accuracy equity and predictive parity. Technical report, Northpointe, Traverse City, MI.Google Scholar
- (2016) Sensitivity analysis without assumptions. Epidemiology 27(3):368–377.Crossref, Google Scholar
- (2015) Preserving statistical validity in adaptive data analysis. Proc. Forty-Seventh Annual ACM Sympos. Theory Comput. (STOC) , Portland, Oregon, 117–126.Google Scholar
- (2019) All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Machine Learn. Res. 20(177):1–81.Google Scholar
- (2013) The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time. Accessed April 23, 2018, http://www.stat.columbia.edu/∼gelman/research/unpublished/p_hacking.pdf.Google Scholar
- (2012) Why we (usually) don’t have to worry about multiple comparisons. J. Res. Educational Effectiveness 5:189–211.Crossref, Google Scholar
- (2017) Handbook of Uncertainty Quantification (Springer International Publishing, Cham, Switzerland).Crossref, Google Scholar
- (1998) Ordinary psychology. Gilbert DT , Fiske ST , Lindzey G , eds. The Handbook of Social Psychology , vol. 2 (McGraw Hill, New York), 89–150.Google Scholar
- (2016) Comment on “estimating the reproducibility of psychological science.” Science 351(6277):1037.Google Scholar
- (2016) Generalized fiducial inference: A review and new results. J. Amer. Statist. Assoc. 111(515):1346–1361.Crossref, Google Scholar
- (2015) The extent and consequences of p-hacking in science. PLoS Biology 13(3):e1002106.Crossref, Google Scholar
- (2012) Fishing, commitment, and communication: A proposal for comprehensive nonbinding research registration. Political Anal. 21(1):1–20.Google Scholar
- (2011) Multivariate matching methods that are monotonic imbalance bounding. J. Amer. Statist. Assoc. 106:345–361.Crossref, Google Scholar
- (2014) An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics 15(1):1–12.Crossref, Google Scholar
- (2011) Thinking, Fast and Slow (Farrar, Straus and Giroux, New York).Google Scholar
- (1995) Replication, replication. Political Sci. Politics 28(3):443–499.Crossref, Google Scholar
- (2006) The dangers of extreme counterfactuals. Political Anal. 14(2):131–159.Crossref, Google Scholar
- (1983) Let’s take the con out of econometrics. Amer. Econom. Rev. 73(1):31–43.Google Scholar
- (2010) Extreme bounds analysis. Durlauf SN , Blume LE , eds. Microeconometrics , The New Palgrave Economics Collection (Palgrave Macmillan, London), 49–52.Google Scholar
- (2016) Prediction uncertainty and optimal experimental design for learning dynamical systems. Chaos 26(6).Crossref, Google Scholar
- (1998) Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54(3):948–963.Crossref, Google Scholar
- (2013) An introduction to sensitivity analysis for unobserved confounding in non-experimental prevention research. Prevention Sci . 14(6):570–580.Google Scholar
- (2015) Research preregistration in political science: The case, counterarguments, and a response to critiques. Political Sci. Politics 48(3):425–429.Crossref, Google Scholar
- (2018) Hypothesis tests that are robust to choice of matching method. Preprint, submitted December 5, https://arxiv.org/abs/1812.02227.Google Scholar
- (2018) We ran 9 billion regressions: Eliminating false positives through computational model robustness. Sociol. Methods Res. 48(1):1–33.Google Scholar
- Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251).Crossref, Google Scholar
- (2011) Believe it or not: How much can we rely on published data on potential drug targets? Nature Rev. Drug Discovery 10:712.Crossref, Google Scholar
- (2020) The age of secrecy and unfairness in recidivism prediction. Harvard Data Sci. Rev. 2(1). https://doi.org/10.1162/99608f92.6ed64b30.Google Scholar
- (1959) The Analysis of Variance (Wiley, New York).Google Scholar
- (2011) False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psych. Methods 22(11):1359–1366.Google Scholar
- (2014) P-curve: A key to the file-drawer. J. Experiment. Psych. General 143(2):534–547.Crossref, Google Scholar
- (2015) Specification curve: Descriptive and inferential statistics on all reasonable specifications. Preprint, submitted November 25, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2694998.Google Scholar
- (2017) Surrogate models for uncertainty quantification: An overview. Sibille A (chair), 2017 11th Eur. Conf. Antennas Propagation (EUCAP) (IEEE, Piscataway, NJ), 793–797.Google Scholar
- (2016) Exact post-selection inference for sequential regression procedures. J. Amer. Statist. Assoc. 111(514):600–620.Crossref, Google Scholar
- (2015) Editorial. Basic Appl. Soc. Psych. 37:1–2.Crossref, Google Scholar
- (2013) Machine learning with operational costs. J. Machine Learn. Res. 14:1989–2028.Google Scholar
- (2014a) On combining machine learning with decision making. Machine Learn. 97(1–2):33–64.Google Scholar
- (2014b) Robust optimization using machine learning for uncertainty sets. Preprint, submitted July 4, https://arxiv.org/abs/1407.1097.Google Scholar
- (2011) Unmeasured confounding for general outcomes, treatments, and confounders: Bias formulas for sensitivity analysis. Epidemiology 22(1):42–52.Crossref, Google Scholar
- (2016) The ASA’s statement on p-values: Context, process, and purpose. Amer. Statist. 70(2):129–133.Crossref, Google Scholar
- (2016) Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Front Psychol. Nov 25;7:1832. doi:10.3389/fpsyg.2016.01832. eCollection 2016.Google Scholar
- (2015) Model uncertainty and robustness: A computational framework for multimodel analysis. Sociological Methods & Research, ePub ahead of print October 23, https://doi.org/10.1177/0049124115610347.Google Scholar
- (2017) Interpretable classification models for recidivism prediction. J. Royal Statist. Soc. 180(3):689–722.Google Scholar

