Always Valid Inference: Continuous Monitoring of A/B Tests

Published Online:https://doi.org/10.1287/opre.2021.2135

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Adv. Neural Inform. Processing Systems 24:2312–2320.Google Scholar
  • Balsubramani A (2014) Sharp finite-time iterated-logarithm martingale concentration. Preprint, submitted May 12, https://arxiv.org/abs/1405.2639.Google Scholar
  • Balsubramani A, Ramdas A (2015) Sequential nonparametric testing with the law of the iterated logarithm. Preprint, submitted June 10, https://arxiv.org/abs/1506.03486.Google Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Google Scholar
  • Bubeck S, Munos R, Stoltz G (2009) Pure exploration in multi-armed bandits problems. Gavaldà R, Lugosi G, Zeugmann T, Zilles S, eds. Internat. Conf. Algorithmic Learning Theory, Lecture Notes in Computer Science, vol. 5809 (Springer, Berlin), 23–37.Google Scholar
  • Darling D, Robbins H (1967) Confidence sequences for mean, variance, and median. Proc. Natl. Acad. Sci. USA 58(1):66–68.CrossrefGoogle Scholar
  • de la Peña VH, Lai TL, Shao QM (2008) Self-Normalized Processes: Limit Theory and Statistical Applications, Probability and its Applications (Springer Science & Business Media, Berlin).Google Scholar
  • Demets DL, Lan KG (1994) Interim analysis: The alpha spending function approach. Statis. Med. 13(13-14):1341–1352.CrossrefGoogle Scholar
  • Even-Dar E, Mannor S, Mansour Y (2002) PAC bounds for multi-armed bandit and Markov decision processes. Kivinen J, Sloan RH, ed. COLT’02 Proc. 15th Annu. Conf. Comput. Learning Theory (Springer, Berlin), 255–270.Google Scholar
  • Fithian W, Wager S (2014) Semiparametric exponential families for heavy-tailed data. Biometrika 102(2):486–493.CrossrefGoogle Scholar
  • Foster DP, Stine RA (2008) α-investing: A procedure for sequential control of expected false discoveries. J. Roy. Statist. Soc. Ser. B Statist. Methodology 70(2):429–444.CrossrefGoogle Scholar
  • Ghosh BK, Sen PK (1991) Handbook of Sequential Analysis (CRC Press, Boca Raton, FL).Google Scholar
  • Howard SR, Ramdas A, McAuliffe J, Sekhon J (2018) Uniform, nonparametric, non-asymptotic confidence sequences. Preprint, submitted October 18, https://arxiv.org/abs/1810.08240.Google Scholar
  • James W, Stein C (1961) Estimation with quadratic loss. Neyman J, ed. Proc. Fourth Berkeley Sympos. Math. Statist. Probab., vol. 1 (University of California Press, Berkeley, CA), 361–379.Google Scholar
  • Jamieson K, Jain L (2018) A bandit approach to multiple testing with false discovery control. Preprint, submitted September 6, https://arxiv.org/abs/1809.02235.Google Scholar
  • Jamieson K, Malloy M, Nowak R, Bubeck S (2014) lil’UCB: An optimal exploration algorithm for multi-armed bandits. Proc. 27th Conf. Learning Theory, Proc. Machine Learn. Res., vol. 35 (PMLR, Barcelona, Spain), 423–439.Google Scholar
  • Javanmard A, Montanari A (2016) Online rules for control of false discovery rate and false discovery exceedance. Preprint, submitted March 29, https://arxiv.org/abs/1603.09000.Google Scholar
  • Johari R, Koomen P, Pekelis L, Walsh D (2017) Peeking at A/B tests: Why it matters, and what to do about it. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining KDD ’17 (Association for Computing Machinery, New York), 1517–1525.Google Scholar
  • Kalyanakrishnan S, Tewari A, Auer P, Stone P (2012) PAC subset selection in stochastic multi-armed bandits. Proc. 29th Internat. Conf. Machine Learning (Omnipress, Madison, WI), 655–662.Google Scholar
  • Kaufmann E, Cappé O, Garivier A (2014) On the complexity of A/B testing. Preprint, submitted May 13, https://arxiv.org/abs/1405.3224.Google Scholar
  • Kohavi R, Deng A, Frasca B, Walker T, Xu Y, Pohlmann N (2013) Online controlled experiments at large scale. Proc. 19th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 1168–1176.Google Scholar
  • Lai TL (2001) Sequential analysis: Some classical problems and new challenges. Statist. Sinica 11(2):303–351.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms. (Cambridge University Press).Google Scholar
  • Lehmann EL, Romano JP, Casella G (1986) Testing Statistical Hypotheses. Springer Texts in Statistics, vol. 150 (Wiley, New York).CrossrefGoogle Scholar
  • Malek A, Katariya S, Chow Y, Ghavamzadeh M (2017) Sequential multiple hypothesis testing with type I error control. Proc. 20th Internat. Conf. Artificial Intelligence Statist., Proc. Machine Learn. Res., vol. 54 (PMLR, Fort Lauderdale, FL), 1468–1476.Google Scholar
  • Miller E (2010) How not to run an A/B test. Accessed June 2, 2021, http://www.evanmiller.org/how-not-to-run-an-ab-test.html.Google Scholar
  • Miller E (2015) Simple sequential A/B testing. Accessed June 2, 2021, http://www.evanmiller.org/sequential-ab-testing.html.Google Scholar
  • Pollak M, Siegmund D (1975) Approximations to the expected sample size of certain sequential tests. Ann. Statist. 3(6):1267–1282.Google Scholar
  • Robbins H (1970) Statistical methods related to the law of the iterated logarithm. Ann. Math. Statist. 41(5):1397–1409.CrossrefGoogle Scholar
  • Robbins H, Siegmund D (1974) The expected sample size of some tests of power one. Ann. Statist. 2(3):415–436.CrossrefGoogle Scholar
  • Scott SL (2015) Multi-armed bandit experiments in the online service economy. Appl. Stochastic Models Bus. Indust. 31(1):37–45.CrossrefGoogle Scholar
  • Siegmund D (1978) Estimation following sequential tests. Biometrika 65(2):341–349.CrossrefGoogle Scholar
  • Siegmund D (1985) Sequential Analysis: Tests and Confidence Intervals, Springer Series in Statistics (Springer, New York).CrossrefGoogle Scholar
  • Tang D, Agarwal A, O’Brien D, Meyer M (2010) Overlapping experiment infrastructure: More, better, faster experimentation. Proc. 16th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 17–26.Google Scholar
  • Wald A (1945) Sequential tests of statistical hypotheses. Ann. Math. Statist. 16(2):117–186.CrossrefGoogle Scholar
  • Yang F, Ramdas A, Jamieson K, Wainwright MJ (2017) A framework for multi-A(rmed)/B(andit) testing with online FDR control. Preprint, submitted June 16, https://arxiv.org/abs/1706.05378.Google Scholar
  • Zhao S, Zhou E, Sabharwal A, Ermon S (2016) Adaptive concentration inequalities for sequential decision problems. Adv. Neural Inform. Processing Systems 29:1343–1351.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.