An Investigation of p-Hacking in E-Commerce A/B Testing
Abstract
In recent years, randomized experiments (or “A/B tests”) have become commonplace in many industrial settings as managers increasingly seek the aid of scientific rigor in their decision making. However, just as this practice has proliferated among firms, the problem of p-hacking—whereby experimenters adjust their sample size or try several statistical analyses until they find one that produces a statistically significant p-value—has emerged as a prevalent concern in the scientific community. Notably, many commentators have highlighted how A/B testing software enables and may even encourage p-hacking behavior. To investigate this phenomenon, we analyze the prevalence of p-hacking in a primary sample of 2,270 experiments conducted by 242 firms on a large U.S.-based e-commerce A/B testing platform. Using multiple statistical techniques—including a novel approach we call the asymmetric caliper test—we analyze the p-values corresponding to each experiment’s designated target metric across multiple significance thresholds. Our findings reveal essentially no evidence for p-hacking in our data. In an extended sample that examines p-hacking across all outcome metrics (encompassing more than 16,000 p-values in total), we similarly observe no evidence of p-hacking behavior. We use simulations to determine that if a modest effect of p-hacking were present in our data set, our methods would have the power to detect it at our current sample size. We contrast our results with the prevalence of p-hacking in academic contexts and discuss a number of possible factors explaining the divergent results, highlighting the potential roles of organizational learning and economic incentives.
History: Olivia Liu Sheng, Senior Editor; Gordon Burtch, Associate Editor.
Funding: The authors are grateful to the Baker Retailing Center and the Mack Institute for Innovation Management at the University of Pennsylvania for helping fund this work.
Supplemental Material: The online appendix is available at https://doi.org/10.1287/isre.2024.0872.

