Getting the Most Out of A/B Tests Using the Asymptotic Minimax-Regret Criteria

Published Online:https://doi.org/10.1287/mnsc.2024.06590

References

  • Abadie A, Imbens GW (2006) Large sample properties of matching estimators for average treatment effects. Econometrica 74:235–267.CrossrefGoogle Scholar
  • Abadie A, Imbens GW (2016) Matching on the estimated propensity score. Econometrica 84:781–807.CrossrefGoogle Scholar
  • Adusumilli K (2022) Neyman allocation is minimax optimal for best arm identification with two arms. Working paper, University of Pennsylvania, Philadelphia.Google Scholar
  • Agrawal S, Juneja S, Glynn P (2019) Optimal delta-correct best-arm selection for general distributions. Preprint, submitted August 24, https://arxiv.org/abs/1908.09094v1.Google Scholar
  • Altman DG (1980) Statistics and ethics in medical research: III. How large a sample? British Medical J. 281:1336–1338.CrossrefGoogle Scholar
  • Amrhein V, Greenland S, McShane B (2019a) Scientists rise up against statistical significance. Nature 567:305–307.CrossrefGoogle Scholar
  • Amrhein V, Trafimow D, Greenland S (2019b) Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. Amer. Statist. 73:262–270.CrossrefGoogle Scholar
  • Athey S, Wager S (2021) Policy learning with observational data. Econometrica 89:133–161.CrossrefGoogle Scholar
  • Audibert J-Y, Bubeck S, Munos R (2010) Best arm identification in multi-armed bandits. Kalai AT, Mohri M, eds. COLT 23rd Conf. Learn. Theory (Omnipress, Madison, WI), 41–53.Google Scholar
  • Bakshy E, Eckles D (2013) Uncertainty in online experiments with dependent data: An evaluation of bootstrap methods. Proc.19th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1303–1311.Google Scholar
  • Banerjee A, Duflo E, Glennerster R, Kinnan C (2015) The miracle of microfinance? Evidence from a randomized evaluation. Amer. Econom. J. Appl. Econom. 7:22–53.CrossrefGoogle Scholar
  • Berger JO (1985) Statistical Decision Theory and Bayesian Analysis, 2nd ed. (Springer, New York).CrossrefGoogle Scholar
  • Blyth CR (1986) Approximate binomial confidence limits. J. Amer. Statist. Assoc. 81:843–855.CrossrefGoogle Scholar
  • Boos DD, Hughes-Oliver JM (2000) How large does n have to be for Z and t intervals? Amer. Statist. 54:121–128.Google Scholar
  • Bradlow ET, Lenk PJ, Allenby GM, Rossi PE (2004) When BDT in marketing meant Bayesian decision theory: The influence of Paul Green’s research. Wind Y, Green PE, eds. Market Research and Modeling: Progress and Prospects (Kluwer Academic Publishers, Boston), 17–39.CrossrefGoogle Scholar
  • Coey D, Cunningham T (2019) Improving treatment effect estimators through experiment splitting. World Wide Web Conf. (Association for Computing Machinery, New York), 285–295.Google Scholar
  • Farrell MH, Liang T, Misra S (2021) Deep neural networks for estimation and inference. Econometrica 89:181–213.CrossrefGoogle Scholar
  • Feit EM, Berman R (2019) Test and roll: Profit-maximizing A/B tests. Marketing Sci. 38:1038–1058.LinkGoogle Scholar
  • Finkelstein A, Taubman S, Wright B, Bernstein M, Gruber J, Newhouse JP, Allen H, Baicker K, Oregon Health Study Group (2012) The Oregon health insurance experiment: Evidence from the first year. Quart. J. Econom. 127:1057–1106.CrossrefGoogle Scholar
  • Fisher RA (1925) Statistical Methods for Research Workers, 2nd ed. (Oliver and Boyd, Edinburgh, UK).Google Scholar
  • Garivier A, Kaufmann E (2016) Optimal best arm identification with fixed confidence. Conf. Learn. Theory (PMLR, New York), 998–1027.Google Scholar
  • Green PE (1961) Some intra-firm applications of Bayesian Decision Theory to problems in business planning. PhD thesis, The Wharton School of the University of Pennsylvania, Philadelphia.Google Scholar
  • Green PE (1962a) Bayesian decision theory in advertising. J. Advertising Res. 2:33–41.CrossrefGoogle Scholar
  • Green PE (1962b) Bayesian statistics and product decisions. Bus. Horizons 5:101–109.CrossrefGoogle Scholar
  • Green PE (1963) Bayesian decision theory in pricing strategy. J. Marketing 27:5–14.CrossrefGoogle Scholar
  • Grover A, Markov T, Attia P, Jin N, Perkins N, Cheong B, Chen M, et al. (2018) Best arm identification in multi-armed bandits with delayed feedback. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 833–842.Google Scholar
  • Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66:315–331.CrossrefGoogle Scholar
  • Hansen BE (2022) Probability and Statistics for Economists (Princeton University Press, Princeton, NJ).Google Scholar
  • Hirano K, Porter JR (2009) Asymptotics for statistical treatment rules. Econometrica 77:1683–1701.CrossrefGoogle Scholar
  • Hirano K, Porter JR (2020) Asymptotic analysis of statistical decision rules in econometrics. Durlauf SN, Hansen LP, Heckman JJ, Matzkin RL, eds. Handbook of Econometrics, vol. 7A (North Holland, Amsterdam), 283–354.CrossrefGoogle Scholar
  • Hirano K, Imbens GW, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189.CrossrefGoogle Scholar
  • Howard SR, Ramadas A, McAuliffe J, Sekhon J (2021) Time-uniform, nonparametric, nonasymptotic confidence sequences. Ann. Statist. 49:1055–1080.CrossrefGoogle Scholar
  • International Conference on Harmonisation E9 Expert Working Group (1999) ICH harmonised tripartite guideline: Statistical principles for clinical trials. Statist. Med. 18:1905–1942.Google Scholar
  • Jamieson K, Nowak R (2014) Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. 2014 48th Annual Conf. Inform. Sci. Systems CISS (IEEE, Piscataway, NJ), 1–6.Google Scholar
  • Jeffreys H (1961) Theory of Probability (Oxford University Press, Oxford, UK).Google Scholar
  • Johnson GA, Lewis RA, Reiley DH (2017) When less is more: Data and power in advertising experiments. Marketing Sci. 36:43–53.LinkGoogle Scholar
  • Karlin S, Rubin H (1956) The theory of decision procedures for distributions with monotone likelihood ratio. Ann. Math. Statist. 27:272–299.CrossrefGoogle Scholar
  • Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best arm identification in multi-armed bandit models. J. Machine Learn. Res. 17:1–42.Google Scholar
  • Kitagawa T, Tetenov A (2018) Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86:591–616.CrossrefGoogle Scholar
  • Kohavi R, Tang D, Xu Y (2020) Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Kohavi R, Deng A, Longbotham R, Xu Y (2014) Seven rules of thumb for web site experimenters. Proc. 20th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1857–1866.Google Scholar
  • Komiyama J, Ariu K, Kato M, Qin C (2021) Optimal simple regret in Bayesian best arm identification. Preprint, submitted November 18, https://arxiv.org/abs/2111.09885v1.Google Scholar
  • Lehmann EL (1993) The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? J. Amer. Statist. Assoc. 88:1242–1249.CrossrefGoogle Scholar
  • Lehmann EL, Romano JP (2022) Testing Statistical Hypotheses, 4th ed. (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • Lenth RV (2001) Some practical guidelines for effective sample size determination. Amer. Statist. 55:187–193.CrossrefGoogle Scholar
  • Lewis RA, Rao JM (2015) The unfavorable economics of measuring the returns to advertising. Quart. J. Econom. 130:1941–1973.CrossrefGoogle Scholar
  • Liese F, Miescke K-J (2008) Statistical Decision Theory—Estimation, Testing, and Selection (Springer, New York).Google Scholar
  • Manski CF (2004) Statistical treatment rules for heterogeneous populations. Econometrica 72:1221–1246.CrossrefGoogle Scholar
  • Manski CF (2019) Treatment choice with trial data: Statistical decision theory should supplant hypothesis testing. Amer. Statist. 73:296–304.CrossrefGoogle Scholar
  • Manski CF (2021) Econometrics for decision making: Building foundations sketched by Haavelmo and Wald. Econometrica 89:2827–2853.CrossrefGoogle Scholar
  • Manski CF, Tetenov A (2016) Sufficient trial size to inform clinical practice. Proc. Natl. Acad. Sci. USA 113:10518–10523.CrossrefGoogle Scholar
  • Mbakop E, Tabord-Meehan M (2021) Model selection for treatment choice: Penalized welfare maximization. Econometrica 89:825–848.CrossrefGoogle Scholar
  • McShane BB, Bradlow ET, Lynch JG Jr, Meyer RJ (2024) “Statistical significance” and statistical reporting: Moving beyond binary. J. Marketing 88:1–19.CrossrefGoogle Scholar
  • McShane BB, Gal D, Gelman A, Robert C, Tackett JL (2019) Abandon statistical significance. Amer. Statist. 73:235–245.CrossrefGoogle Scholar
  • Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. Engle R, McFadden D, eds. Handbook of Econometrics, vol. 4 (Elsevier, Amsterdam), 2111–2245.CrossrefGoogle Scholar
  • Neyman J, Pearson ES (1928a) On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika 20(1/2):175–240.Google Scholar
  • Neyman J, Pearson ES (1928b) On the use and interpretation of certain test criteria for purposes of statistical inference: Part II. Biometrika 20(1/2):263–294.Google Scholar
  • Perezgonzalez JD (2015) Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Frontiers Psych. 6:223.Google Scholar
  • Rossi PE, Allenby GM (2003) Bayesian statistics and marketing. Marketing Sci. 22:304–328.LinkGoogle Scholar
  • Rossi PE, McCulloch RE, Allenby GM (1996) The value of purchase history data in target marketing. Marketing Sci. 15:321–340.LinkGoogle Scholar
  • Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12(4):1151–1172.CrossrefGoogle Scholar
  • Russo D (2020) Simple Bayesian algorithms for best-arm identification. Oper. Res. 68:1625–1647.LinkGoogle Scholar
  • Sahni NS, Nair HS (2020a) Does advertising serve as a signal? Evidence from a field experiment in mobile search. Rev. Econom. Stud. 87:1529–1564.CrossrefGoogle Scholar
  • Sahni NS, Nair HS (2020b) Sponsorship disclosure and consumer deception: Experimental evidence from native advertising in mobile search. Marketing Sci. 39:5–32.LinkGoogle Scholar
  • Savage LJ (1951) The theory of statistical decision. J. Amer. Statist. Assoc. 46:55–67.CrossrefGoogle Scholar
  • Sawyer AG, Peter JP (1983) The significance of statistical significance tests in marketing research. J. Marketing Res. 20:122–133.CrossrefGoogle Scholar
  • Scott SL (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26:639–658.CrossrefGoogle Scholar
  • Stoye J (2009) Minimax regret treatment choice with finite samples. J. Econometrics 151:70–81.CrossrefGoogle Scholar
  • Stoye J (2012) Minimax regret treatment choice with covariates or with limited validity of experiments. J. Econometrics 166:138–156.CrossrefGoogle Scholar
  • Tetenov A (2012) Statistical treatment choice based on asymmetric minimax regret criteria. J. Econometrics 166:157–165.CrossrefGoogle Scholar
  • The American Statistician (2019) Statistical Inference in the 21st Century: A World Beyond p < 0.05, vol. 73 (The American Statistician).Google Scholar
  • van der Vaart AW (1998) Asymptotic Statistics (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Wald A (1950) Statistical Decision Functions (Wiley, Hoboken, NJ).Google Scholar
  • Wasserstein RL, Lazar NA (2016) The ASA statement on p-values: Context, process, and purpose. Amer. Statistician 70(2):129–133.Google Scholar
  • Wasserstein RL, Schirm AL, Lazar NA (2019) Moving to a world beyond “p < 0.05”. Amer. Statist. 73:1–19.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.