PERSPECTIVE—Researchers Should Make Thoughtful Assessments Instead of Null-Hypothesis Significance Tests

Published Online:https://doi.org/10.1287/orsc.1100.0557

References

  • Aguinis H., Werner S., Abbott J. L., Angert C., Park J. H. S., Kohlhausen D. Customer-centric science: Reporting significant research results with rigor, relevance, and practical impact in mind. Organ. Res. Methods (2010) 13(3):515–539CrossrefGoogle Scholar
  • Algina J., Keselman H. J. Approximate confidence intervals for effect sizes. Ed. Psych. Measurement (2003) 63(4):537–553CrossrefGoogle Scholar
  • American Educational Research Association Standards for reporting on empirical social science research in AERA publications. Ed. Res. (2006) 35(6):33–40CrossrefGoogle Scholar
  • American Psychological AssociationPublication Manual of the American Psychological Association (2010) (American Psychological Association, Washington, DC) Google Scholar
  • Armstrong J. S. Significance tests harm progress in forecasting. Internat. J. Forecasting (2007) 23(2):321–327CrossrefGoogle Scholar
  • Berkson J. Some difficulties of interpretation encountered in the application of the chi-square test. J. Amer. Statist. Assoc. (1938) 33(203):526–536CrossrefGoogle Scholar
  • Bowman E. H. Content analysis of annual reports for corporate strategy and risk. Interfaces (1984) 14(1):61–71LinkGoogle Scholar
  • Box G. E. P., Draper N. R.Evolutionary Operation (1969) (Wiley, New York) Google Scholar
  • Breaugh J. A. Effect size estimation: Factors to consider and mistakes to avoid. J. Management (2003) 29(1):79–97CrossrefGoogle Scholar
  • Cadman L., Findlay A. Assessing practice nurses' change in nutrition knowledge following training from a primary care dietitian. J. Royal Soc. Promotion Health (1998) 118(4):206–209CrossrefGoogle Scholar
  • Capraro R. M., Capraro M. Treatments of effect sizes and statistical significance tests in textbooks. Ed. Psych. Measurement (2002) 62(5):771–782CrossrefGoogle Scholar
  • Claudy J. G. Comparison of five variable weighting procedures. Ed. Psych. Measurement (1972) 32(2):311–322CrossrefGoogle Scholar
  • Cleveland W. S.The Elements of Graphing Data (1985) (Wadsworth, Monterey, CA) Google Scholar
  • Cohen J. The earth is round (p < 0.05). Amer. Psych. (1994) 49(12):997–1003CrossrefGoogle Scholar
  • Colhoun H. M., McKeigue P. M., Davey Smith G. Problems of reporting genetic associations with complex outcomes. Lancet (2003) 36(9360):865–872CrossrefGoogle Scholar
  • Connor E. F., Simberloff D. Competition, scientific method, and null models in ecology. Amer. Sci. (1986) 74(2):155–162Google Scholar
  • Cortina J. M., Folger R. G. When is it acceptable to accept a null hypothesis: No way, Jose? Organ. Res. Methods (1998) 1(3):334–350CrossrefGoogle Scholar
  • Cortina J. M., Nouri H.Effect Size for ANOVA Designs (1999) (Sage, Beverly Hills, CA) Google Scholar
  • Cumming G., Finch S. A primer on the understanding, use and calculation of confidence intervals that are based on central and noncentral distributions. Ed. Psych. Measurement (2001) 61(4):532–574CrossrefGoogle Scholar
  • Darlington M. L. Comparing two groups by simple graphs. Psych. Bull. (1973) 79(2):110–116CrossrefGoogle Scholar
  • Doksum K. A. Some graphical methods in statistics: A review and some extensions. Statistica Neerlandica (1977) 31(2):53–68CrossrefGoogle Scholar
  • Dorans N., Drasgow F. Alternative weighting schemes for linear prediction. Organ. Behav. Human Perform. (1978) 21(3):316–345CrossrefGoogle Scholar
  • Einhorn H. J., Hogarth R. M. Unit weighting schemes for decision making. Organ. Behav. Human. Perform. (1975) 13(2):171–192CrossrefGoogle Scholar
  • Elliott J. W. A direct comparison of short-run GNP forecasting models. J. Bus. (1973) 46(1):33–60CrossrefGoogle Scholar
  • Ellis P.The Essential Guide to Effect Sizes: Statistical Power, Meta-Analysis and the Interpretation of Research Results (2010) (Cambridge University Press, Cambridge, UK) CrossrefGoogle Scholar
  • Faust D.The Limits of Scientific Reasoning (1984) (University of Minnesota Press, Minneapolis) Google Scholar
  • Fidler F. From statistical significance to effect estimation: Statistical reform in psychology, medicine and ecology. (2005) . Doctoral dissertation, University of Melbourne, Melbourne, VIC, AustraliaGoogle Scholar
  • Fidler F., Cumming G., Burgman M., Thomason N. Statistical reform in medicine, psychology and ecology. J. Socio-Econom. (2004) 33:615–630CrossrefGoogle Scholar
  • Fidler F., Cumming G., Thomason N., Pannuzzo D., Smith J., Fyffe P., Edmonds H., Harrington C., Schmitt R. Evaluating the effectiveness of editorial policy to improve statistical practice: The case of the journal of consulting and clinical psychology. J. Consulting Clinical Psych. (2005) 73(1):136–143CrossrefGoogle Scholar
  • Fiol C. M. A semiotic analysis of corporate language: Organizational boundaries and joint venturing. Admin. Sci. Quart. (1989) 34(2):277–303CrossrefGoogle Scholar
  • Fisher R. A.Statistical Methods for Research Workers (1925) (Oliver and Boyd, Edinburgh, UK) Google Scholar
  • Fleiss J. L. Significance tests do have a role in epidemiological research: Reactions to A. A. Walker. Amer. J. Public Health (1986) 76(5):559–560CrossrefGoogle Scholar
  • Gauch H. G.Scientific Method in Practice (2002) (Cambridge University Press, Cambridge, UK) CrossrefGoogle Scholar
  • Gauch H. G. Winning the accuracy game. Amer. Sci. (2006) 94(2):135–143CrossrefGoogle Scholar
  • Goldberg L. R. Man versus model of man: A rationale, plus some evidence, for a method of improving on clinical inference. Psych. Bull. (1970) 73(6):422–432CrossrefGoogle Scholar
  • Greenwald A. G. Consequences of prejudice against the null hypothesis. Psych. Bull. (1975) 82(1):1–20CrossrefGoogle Scholar
  • Greiser C. M., Greiser E. M., Dören M. Menopausal hormone therapy and risk of breast cancer: A meta-analysis of epidemiological studies and randomized controlled trials. Human Reproduction Update (2005) 11(6):561–573CrossrefGoogle Scholar
  • Grissom R. J., Kim J. J.Effect Sizes for Research: A Broad Practical Approach (2005) (Lawrence Erlbaum Associates, Mahwah, NJ) Google Scholar
  • Haller H., Krauss S. Misinterpretations of significance: A problem students share with their teachers? Methods Psych. Res. (2002) 7(1):1–20Google Scholar
  • Hubbard R., Armstrong J. S. Are null results becoming an endangered species in marketing? Marketing Lett. (1992) 3(2):127–136CrossrefGoogle Scholar
  • Hubbard R., Armstrong J. S. Why we don't really know what statistical significance means: Implications for educators. J. Marketing Ed. (2006) 28(2):114–120CrossrefGoogle Scholar
  • Hubbard R., Bayarri M. J. Confusion over measures of evidence (p's) versus errors (α's) in classical statistical testing. Amer. Statistician (2003) 57(3):171–178CrossrefGoogle Scholar
  • Ioannidis J. P. A. Genetic associations: False or true? Trends Molecular Med. (2003) 9(4):135–138CrossrefGoogle Scholar
  • Ioannidis J. P. A. Contradicted and initially stronger effects in highly cited clinical research. J. Amer. Medical Assoc. (2005a) 294(2):218–228CrossrefGoogle Scholar
  • Ioannidis J. P. A. Why most published research findings are false. PLoS Med. (2005b) 2(8):e124CrossrefGoogle Scholar
  • Jeffreys W. H., Berger J. O. Ockham's razor and Bayesian analysis. Amer. Sci. (1992) 80(1):64–72Google Scholar
  • John I. D. Statistics as rhetoric in psychology. Australian Psych. (1992) 27(3):144–149CrossrefGoogle Scholar
  • Kendall P. C. Editorial. J. Consulting Clinical Psych. (1997) 65(1):3–5CrossrefGoogle Scholar
  • Keselman H. J., Wilcox R. R., Lix L. M., Algina J., Fradette K. Adaptive robust estimation and testing. British J. Math. Statist. Psych. (2007) 60(2):267–293CrossrefGoogle Scholar
  • Kline R. B., Kline R. What's wrong with statistical tests—And where we go from here. Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research (2004) (APA Books, Washington, DC) 61–91Chapter 3CrossrefGoogle Scholar
  • Langman M. J. S. Towards estimation and confidence intervals. British Med. J. (1986) 292(6522):716CrossrefGoogle Scholar
  • Levinthal D. A. Random walks and organizational mortality. Admin. Sci. Quart. (1991) 36(3):397–420CrossrefGoogle Scholar
  • Lykken D. T. Statistical significance in psychological research. Psych. Bull. (1968) 70(3, Part 1):151–159CrossrefGoogle Scholar
  • Makridakis S., Andersen A., Carbone R., Fildes R., Hibon M., Lewandowski R., Newton J., Parzen E., Winkler R. L. The accuracy of extrapolation (time series) methods: Results of a forecasting competition. J. Forecasting (1982) 1:111–153CrossrefGoogle Scholar
  • Mayo D., Sarkar S., Pfeifer J. Philosophy of Statistics. The Philosophy of Science: An Encyclopedia (2006) (Routledge, London) 802–815Google Scholar
  • Meehl P. E.Clinical versus Statistical Prediction: A Theoretical Analysis and Review of the Evidence (1954) (University of Minnesota Press, Minneapolis) CrossrefGoogle Scholar
  • Meehl P. E. Interview by Fiona Fidler. (2002) University of Minnesota, Minneapolis(August 27Google Scholar
  • Mezias J. M., Starbuck W. H. Studying the accuracy of managers' perceptions: A research odyssey. British J. Management (2003) 14(1):3–17CrossrefGoogle Scholar
  • Oakes M.Statistical Inference. A Commentary for the Social and Behavioural Sciences (1986) (Wiley, Chichester, UK) Google Scholar
  • Pant P. N., Starbuck W. H. Innocents in the forest: Forecasting and research methods. J. Management (1990) 16(2):433–460CrossrefGoogle Scholar
  • Popper K. R.The Logic of Scientific Discovery (1959) (Basic Books, New York) Google Scholar
  • Powell T. C. Varieties of competitive parity. Strategic Management J. (2003) 24(1):61–86CrossrefGoogle Scholar
  • Rosenberg B., Houglet M. Error rates in CRSP and Compustat data bases and their implications. J. Finance (1974) 29(4):1303–1310CrossrefGoogle Scholar
  • Rosnow R., Rosenthal R. Statistical procedures and the justification of knowledge in psychological science. Amer. Psych. (1989) 44(10):1276–1284CrossrefGoogle Scholar
  • Rousseau D. M., Manning J., Denyer D. Evidence in management and organizational science: Assembling the field's full weight of scientific knowledge through syntheses. Acad. Management Ann. (2008) 2(1):475–515CrossrefGoogle Scholar
  • Rousseeuw P. J., Leroy A. M.Robust Regression and Outlier Detection (1987) (Wiley, New York) CrossrefGoogle Scholar
  • Rothman K. J. Writing for epidemiology. Epidemiology (1998) 9(3):333–337CrossrefGoogle Scholar
  • Salancik G. R., Meindl J. R. Corporate attributions as strategic illusions of management control. Admin. Sci. Quart. (1984) 29(12):238–254CrossrefGoogle Scholar
  • San Miguel J. G. The reliability of R&D data in Compustat and 10-K Reports. Accounting Rev. (1977) 52(3):638–641Google Scholar
  • Schmidt F. L. The relative efficiency of regression and simple unit predictor weights in applied differential psychology. Ed. Psych. Measurement (1971) 31(3):699–714CrossrefGoogle Scholar
  • Schmidt F. L. Statistical significance testing and cumulative knowledge in psychology: Implications for the training of researchers. Psych. Methods (1996) 1(2):115–129CrossrefGoogle Scholar
  • Schmidt F. L., Hunter J. E., Harlow L., Mulaik S., Steiger J. Eight common but false objections to the discontinuation of significance testing in analysis of research data. What If There Were No Significance Tests? (1997) (Lawrence Erlbaum Associates, Mahwah, NJ) 37–63Google Scholar
  • Schwab A., Starbuck W. H., Bergh D., Ketchen D. Null-hypothesis significance tests in behavioral and management research: We can do better. Research Methodology in Strategy and Management (2009) 5(Elsevier, New York) 29–54Google Scholar
  • Seth A., Carlson K. D., Hatfield D. E., Lan H. W., Bergh D., Ketchen D. So what? Beyond statistical significance to substantive significance in strategy research. Research Methodology in Strategy and Management (2009) 5(Elsevier, New York) 3–28Google Scholar
  • Shah N. R., Borenstein J., Dubois R. W. Postmenopausal hormone therapy and breast cancer: A systematic review and meta-analysis. Menopause (2005) 12(6):668–678CrossrefGoogle Scholar
  • Shrout P. E. Should significance tests be banned? Psych. Sci. (1997) 8(1):1–2CrossrefGoogle Scholar
  • Smithson M. Correct confidence intervals for various regression effect sizes and parameters: The importance of noncentral distributions in computing intervals. Ed. Psych. Measurement (2001) 61(4):605–632CrossrefGoogle Scholar
  • Soofi E. S., Nystrom P. C., Yasai-Ardekani M. Executives' perceived environmental uncertainty shortly after 9/11. Comput. Statist. Data Anal. (2009) 53(9):3502–3515CrossrefGoogle Scholar
  • Starbuck W. H., Baum J. A. C., Singh J. V. On behalf of naïveté. Evolutionary Dynamics of Organizations (1994) (Oxford University Press, New York) 205–220CrossrefGoogle Scholar
  • Starbuck W. H.The Production of Knowledge: The Challenge of Social Science Research (2006) (Oxford University Press, Oxford, UK) CrossrefGoogle Scholar
  • Steiger J. H., Fouladi R. T. R2—A computer-program for interval estimation, power calculations, sample-size estimation, and hypothesis testing in multiple regression. Behav. Res. Methods, Instruments, Comput. (1992) 24(4):581–582CrossrefGoogle Scholar
  • Task Force on Statistical Significance (1996) . Initial report. Board of Scientific Affairs, American Psychological Association, Washington, DCGoogle Scholar
  • Thompson B. Stepwise regression and stepwise discriminant analysis need not apply here: A guidelines editorial. Ed. Psych. Measurement (1995) 55(4):525–534CrossrefGoogle Scholar
  • Thompson B. Why “encouraging” effect size reporting is not working: The etiology of researcher resistance to changing practices. J. Psych. (1999a) 133(2):133–140CrossrefGoogle Scholar
  • Thompson B. Journal editorial policies regarding statistical significance tests: Heat is to fire as p is to importance. Ed. Psych. Rev. (1999b) 11(2):157–169CrossrefGoogle Scholar
  • Thompson B. What future quantitative social science research could look like: Confidence intervals for effect sizes. Ed. Res. (2002) 31(3):25–32CrossrefGoogle Scholar
  • Thompson B.Foundations of Behavioral Statistics: An Insight-Based Approach (2006) (Guilford, New York) Google Scholar
  • Tukey J. W. The philosophy of multiple comparisons. Statist. Sci. (1991) 6(1):100–116CrossrefGoogle Scholar
  • Vacha-Haase T., Nilsson J. E., Reetz D. R., Lance T. S., Thompson B. Reporting practices and APA editorial policies regarding statistical significance and effect size. Theory Psych. (2000) 10(3):413–425CrossrefGoogle Scholar
  • Wacholder S., Chanock S., Garcia-Closas M., El ghormli L., Rothman N. Assessing the probability that a positive report is false: An approach for molecular epidemiology studies. J. Natl. Cancer Inst. (2004) 96(6):434–442CrossrefGoogle Scholar
  • Webster E. J., Starbuck W. H., Cooper C. L., Robertson I. Theory building in industrial and organizational psychology. International Review of Industrial and Organizational Psychology 1988 (1988) (Wiley, London) 93–138Google Scholar
  • Wilcox R. R.Introduction to Robust Estimation and Hypothesis Testing (1997) (Academic Press, San Diego) Google Scholar
  • Wilk M. B., Gnanades R. Probability plotting methods for analysis of data. Biometrika (1968) 55(1):1–17Google Scholar
  • Wilkinson L. APA Task Force on Statistical Inference. Statistical methods in psychology journals: Guidelines and explanations. Amer. Psych. (1999) 54(8):594–604CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.