Getting the Most Out of A/B Tests Using the Asymptotic Minimax-Regret Criteria

Joonhwi Joo
Corresponding Author
Joonhwi Joo
[email protected]
https://orcid.org/0000-0002-8716-2314
Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author
,
Khai X. Chiong
Khai X. Chiong
[email protected]
https://orcid.org/0000-0002-6713-8907
Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080
Search for more papers by this author

Joonhwi Joo

Corresponding Author

Joonhwi Joo

[email protected]

https://orcid.org/0000-0002-8716-2314

Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Khai X. Chiong

[email protected]

https://orcid.org/0000-0002-6713-8907

Naveen Jindal School of Management, The University of Texas at Dallas, Richardson, Texas 75080

Search for more papers by this author

Published Online:11 Sep 2025https://doi.org/10.1287/mnsc.2024.06590

References

Abadie A, Imbens GW (2006) Large sample properties of matching estimators for average treatment effects. Econometrica 74:235–267.Crossref, Google Scholar
Abadie A, Imbens GW (2016) Matching on the estimated propensity score. Econometrica 84:781–807.Crossref, Google Scholar
Adusumilli K (2022) Neyman allocation is minimax optimal for best arm identification with two arms. Working paper, University of Pennsylvania, Philadelphia.Google Scholar
Agrawal S, Juneja S, Glynn P (2019) Optimal delta-correct best-arm selection for general distributions. Preprint, submitted August 24, https://arxiv.org/abs/1908.09094v1.Google Scholar
Altman DG (1980) Statistics and ethics in medical research: III. How large a sample? British Medical J. 281:1336–1338.Crossref, Google Scholar
Amrhein V, Greenland S, McShane B (2019a) Scientists rise up against statistical significance. Nature 567:305–307.Crossref, Google Scholar
Amrhein V, Trafimow D, Greenland S (2019b) Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. Amer. Statist. 73:262–270.Crossref, Google Scholar
Athey S, Wager S (2021) Policy learning with observational data. Econometrica 89:133–161.Crossref, Google Scholar
Audibert J-Y, Bubeck S, Munos R (2010) Best arm identification in multi-armed bandits. Kalai AT, Mohri M, eds. COLT 23rd Conf. Learn. Theory (Omnipress, Madison, WI), 41–53.Google Scholar
Bakshy E, Eckles D (2013) Uncertainty in online experiments with dependent data: An evaluation of bootstrap methods. Proc.19th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1303–1311.Google Scholar
Banerjee A, Duflo E, Glennerster R, Kinnan C (2015) The miracle of microfinance? Evidence from a randomized evaluation. Amer. Econom. J. Appl. Econom. 7:22–53.Crossref, Google Scholar
Berger JO (1985) Statistical Decision Theory and Bayesian Analysis, 2nd ed. (Springer, New York).Crossref, Google Scholar
Blyth CR (1986) Approximate binomial confidence limits. J. Amer. Statist. Assoc. 81:843–855.Crossref, Google Scholar
Boos DD, Hughes-Oliver JM (2000) How large does n have to be for Z and t intervals? Amer. Statist. 54:121–128.Google Scholar
Bradlow ET, Lenk PJ, Allenby GM, Rossi PE (2004) When BDT in marketing meant Bayesian decision theory: The influence of Paul Green’s research. Wind Y, Green PE, eds. Market Research and Modeling: Progress and Prospects (Kluwer Academic Publishers, Boston), 17–39.Crossref, Google Scholar
Coey D, Cunningham T (2019) Improving treatment effect estimators through experiment splitting. World Wide Web Conf. (Association for Computing Machinery, New York), 285–295.Google Scholar
Farrell MH, Liang T, Misra S (2021) Deep neural networks for estimation and inference. Econometrica 89:181–213.Crossref, Google Scholar
Feit EM, Berman R (2019) Test and roll: Profit-maximizing A/B tests. Marketing Sci. 38:1038–1058.Link, Google Scholar
Finkelstein A, Taubman S, Wright B, Bernstein M, Gruber J, Newhouse JP, Allen H, Baicker K, Oregon Health Study Group (2012) The Oregon health insurance experiment: Evidence from the first year. Quart. J. Econom. 127:1057–1106.Crossref, Google Scholar
Fisher RA (1925) Statistical Methods for Research Workers, 2nd ed. (Oliver and Boyd, Edinburgh, UK).Google Scholar
Garivier A, Kaufmann E (2016) Optimal best arm identification with fixed confidence. Conf. Learn. Theory (PMLR, New York), 998–1027.Google Scholar
Green PE (1961) Some intra-firm applications of Bayesian Decision Theory to problems in business planning. PhD thesis, The Wharton School of the University of Pennsylvania, Philadelphia.Google Scholar
Green PE (1962a) Bayesian decision theory in advertising. J. Advertising Res. 2:33–41.Crossref, Google Scholar
Green PE (1962b) Bayesian statistics and product decisions. Bus. Horizons 5:101–109.Crossref, Google Scholar
Green PE (1963) Bayesian decision theory in pricing strategy. J. Marketing 27:5–14.Crossref, Google Scholar
Grover A, Markov T, Attia P, Jin N, Perkins N, Cheong B, Chen M, et al. (2018) Best arm identification in multi-armed bandits with delayed feedback. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 833–842.Google Scholar
Hahn J (1998) On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66:315–331.Crossref, Google Scholar
Hansen BE (2022) Probability and Statistics for Economists (Princeton University Press, Princeton, NJ).Google Scholar
Hirano K, Porter JR (2009) Asymptotics for statistical treatment rules. Econometrica 77:1683–1701.Crossref, Google Scholar
Hirano K, Porter JR (2020) Asymptotic analysis of statistical decision rules in econometrics. Durlauf SN, Hansen LP, Heckman JJ, Matzkin RL, eds. Handbook of Econometrics, vol. 7A (North Holland, Amsterdam), 283–354.Crossref, Google Scholar
Hirano K, Imbens GW, Ridder G (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:1161–1189.Crossref, Google Scholar
Howard SR, Ramadas A, McAuliffe J, Sekhon J (2021) Time-uniform, nonparametric, nonasymptotic confidence sequences. Ann. Statist. 49:1055–1080.Crossref, Google Scholar
International Conference on Harmonisation E9 Expert Working Group (1999) ICH harmonised tripartite guideline: Statistical principles for clinical trials. Statist. Med. 18:1905–1942.Google Scholar
Jamieson K, Nowak R (2014) Best-arm identification algorithms for multi-armed bandits in the fixed confidence setting. 2014 48th Annual Conf. Inform. Sci. Systems CISS (IEEE, Piscataway, NJ), 1–6.Google Scholar
Jeffreys H (1961) Theory of Probability (Oxford University Press, Oxford, UK).Google Scholar
Johnson GA, Lewis RA, Reiley DH (2017) When less is more: Data and power in advertising experiments. Marketing Sci. 36:43–53.Link, Google Scholar
Karlin S, Rubin H (1956) The theory of decision procedures for distributions with monotone likelihood ratio. Ann. Math. Statist. 27:272–299.Crossref, Google Scholar
Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best arm identification in multi-armed bandit models. J. Machine Learn. Res. 17:1–42.Google Scholar
Kitagawa T, Tetenov A (2018) Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86:591–616.Crossref, Google Scholar
Kohavi R, Tang D, Xu Y (2020) Trustworthy Online Controlled Experiments: A Practical Guide to A/B Testing (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Kohavi R, Deng A, Longbotham R, Xu Y (2014) Seven rules of thumb for web site experimenters. Proc. 20th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1857–1866.Google Scholar
Komiyama J, Ariu K, Kato M, Qin C (2021) Optimal simple regret in Bayesian best arm identification. Preprint, submitted November 18, https://arxiv.org/abs/2111.09885v1.Google Scholar
Lehmann EL (1993) The Fisher, Neyman-Pearson theories of testing hypotheses: One theory or two? J. Amer. Statist. Assoc. 88:1242–1249.Crossref, Google Scholar
Lehmann EL, Romano JP (2022) Testing Statistical Hypotheses, 4th ed. (Springer, Cham, Switzerland).Crossref, Google Scholar
Lenth RV (2001) Some practical guidelines for effective sample size determination. Amer. Statist. 55:187–193.Crossref, Google Scholar
Lewis RA, Rao JM (2015) The unfavorable economics of measuring the returns to advertising. Quart. J. Econom. 130:1941–1973.Crossref, Google Scholar
Liese F, Miescke K-J (2008) Statistical Decision Theory—Estimation, Testing, and Selection (Springer, New York).Google Scholar
Manski CF (2004) Statistical treatment rules for heterogeneous populations. Econometrica 72:1221–1246.Crossref, Google Scholar
Manski CF (2019) Treatment choice with trial data: Statistical decision theory should supplant hypothesis testing. Amer. Statist. 73:296–304.Crossref, Google Scholar
Manski CF (2021) Econometrics for decision making: Building foundations sketched by Haavelmo and Wald. Econometrica 89:2827–2853.Crossref, Google Scholar
Manski CF, Tetenov A (2016) Sufficient trial size to inform clinical practice. Proc. Natl. Acad. Sci. USA 113:10518–10523.Crossref, Google Scholar
Mbakop E, Tabord-Meehan M (2021) Model selection for treatment choice: Penalized welfare maximization. Econometrica 89:825–848.Crossref, Google Scholar
McShane BB, Bradlow ET, Lynch JG Jr, Meyer RJ (2024) “Statistical significance” and statistical reporting: Moving beyond binary. J. Marketing 88:1–19.Crossref, Google Scholar
McShane BB, Gal D, Gelman A, Robert C, Tackett JL (2019) Abandon statistical significance. Amer. Statist. 73:235–245.Crossref, Google Scholar
Newey WK, McFadden D (1994) Large sample estimation and hypothesis testing. Engle R, McFadden D, eds. Handbook of Econometrics, vol. 4 (Elsevier, Amsterdam), 2111–2245.Crossref, Google Scholar
Neyman J, Pearson ES (1928a) On the use and interpretation of certain test criteria for purposes of statistical inference: Part I. Biometrika 20(1/2):175–240.Google Scholar
Neyman J, Pearson ES (1928b) On the use and interpretation of certain test criteria for purposes of statistical inference: Part II. Biometrika 20(1/2):263–294.Google Scholar
Perezgonzalez JD (2015) Fisher, Neyman-Pearson or NHST? A tutorial for teaching data testing. Frontiers Psych. 6:223.Google Scholar
Rossi PE, Allenby GM (2003) Bayesian statistics and marketing. Marketing Sci. 22:304–328.Link, Google Scholar
Rossi PE, McCulloch RE, Allenby GM (1996) The value of purchase history data in target marketing. Marketing Sci. 15:321–340.Link, Google Scholar
Rubin DB (1984) Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12(4):1151–1172.Crossref, Google Scholar
Russo D (2020) Simple Bayesian algorithms for best-arm identification. Oper. Res. 68:1625–1647.Link, Google Scholar
Sahni NS, Nair HS (2020a) Does advertising serve as a signal? Evidence from a field experiment in mobile search. Rev. Econom. Stud. 87:1529–1564.Crossref, Google Scholar
Sahni NS, Nair HS (2020b) Sponsorship disclosure and consumer deception: Experimental evidence from native advertising in mobile search. Marketing Sci. 39:5–32.Link, Google Scholar
Savage LJ (1951) The theory of statistical decision. J. Amer. Statist. Assoc. 46:55–67.Crossref, Google Scholar
Sawyer AG, Peter JP (1983) The significance of statistical significance tests in marketing research. J. Marketing Res. 20:122–133.Crossref, Google Scholar
Scott SL (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26:639–658.Crossref, Google Scholar
Stoye J (2009) Minimax regret treatment choice with finite samples. J. Econometrics 151:70–81.Crossref, Google Scholar
Stoye J (2012) Minimax regret treatment choice with covariates or with limited validity of experiments. J. Econometrics 166:138–156.Crossref, Google Scholar
Tetenov A (2012) Statistical treatment choice based on asymmetric minimax regret criteria. J. Econometrics 166:157–165.Crossref, Google Scholar
The American Statistician (2019) Statistical Inference in the 21st Century: A World Beyond p < 0.05, vol. 73 (The American Statistician).Google Scholar
van der Vaart AW (1998) Asymptotic Statistics (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Wald A (1950) Statistical Decision Functions (Wiley, Hoboken, NJ).Google Scholar
Wasserstein RL, Lazar NA (2016) The ASA statement on p-values: Context, process, and purpose. Amer. Statistician 70(2):129–133.Google Scholar
Wasserstein RL, Schirm AL, Lazar NA (2019) Moving to a world beyond “p < 0.05”. Amer. Statist. 73:1–19.Crossref, Google Scholar

Volume 72, Issue 5

May 2026

Pages vii-xii, 3629-4567, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:June 20, 2024
Accepted:February 24, 2025
Published Online:September 11, 2025

Cite as

Joonhwi Joo, Khai X. Chiong (2025) Getting the Most Out of A/B Tests Using the Asymptotic Minimax-Regret Criteria. Management Science 72(5):4450-4473.

https://doi.org/10.1287/mnsc.2024.06590

Keywords

Acknowledgments

The authors thank Ron Berman, Jason Choi, Giovanni Compiani, Jean-Pierre Dubé, Andrey Fradkin, Silvia Hristakeva, Max Joo, Kyeongbae Kim, TI Kim, Xinyao Kong, Yufeng Huang, Samir Mamadehussene, Ilya Morozov, Ram Rao, Jiwoong Shin, Andrey Simonov, Jinyeong Son, Kosuke Uetake, Nils Wernerfelt, three anonymous reviewers, the associate editor, the department editor, seminar and conference participants at London School of Economics, Korea Advanced Institute of Science and Technology Business School, Sogang University, Tilburg University, Washington University in St. Louis Olin, Summer Institute of Competitive Strategy 2023, and Yale Customer Insights for helpful comments.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Getting the Most Out of A/B Tests Using the Asymptotic Minimax-Regret Criteria

References

Volume 72, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News