Weak Signal Asymptotics for Sequentially Randomized Experiments

Xu Kuang
Xu Kuang
[email protected]
https://orcid.org/0000-0002-2221-1648
Graduate School of Business, Stanford University, Stanford, California 94305
Search for more papers by this author
,
Stefan Wager
Corresponding Author
Stefan Wager
[email protected]
https://orcid.org/0000-0002-7526-9077
Graduate School of Business, Stanford University, Stanford, California 94305
Search for more papers by this author

Graduate School of Business, Stanford University, Stanford, California 94305

Search for more papers by this author

Stefan Wager

Corresponding Author

Stefan Wager

[email protected]

https://orcid.org/0000-0002-7526-9077

Graduate School of Business, Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:6 Dec 2023https://doi.org/10.1287/mnsc.2023.4964

References

Agrawal S, Goyal N (2017) Near-optimal regret bounds for Thompson sampling. J. ACM 64(5):1–24.Crossref, Google Scholar
Araman VF, Caldentey R (2021) Diffusion approximations for a class of sequential testing problems. Preprint, submitted February 13, https://arxiv.org/abs/2102.07030.Google Scholar
Athey S, Wager S (2021) Policy learning with observational data. Econometrica 89(1):133–161.Crossref, Google Scholar
Athey S, Baird S, Hadad V, Jamison J, McIntosh C, Özler B, Parisotto L (2021) Increasing the take-up of long acting reversible contraceptives among adolescents and young women in Cameroon. Development Research Group, The World Bank, Washington, DC.Google Scholar
Audibert J-Y, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. COLT, 217–226.Google Scholar
Auer P, Ortner R (2010) UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodic Math. Hungary 61(1–2):55–65.Crossref, Google Scholar
Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
Bubeck S, Liu C-Y (2014) Prior-free and prior-dependent regret bounds for Thompson sampling. Proc. 48th Annual Conf. on Inform. Sci. and Systems (IEEE, Piscataway, NJ), 1–9.Google Scholar
Caria S, Kasy M, Quinn S, Shami S, Teytelboym A (2020) An adaptive targeted field experiment: Job search assistance for refugees in Jordan. Working paper, University of Warwick, Coventry, England.Google Scholar
Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Advances in Neural Information Processing Systems, 2249–2257.Google Scholar
Chernoff H (1959) Sequential design of experiments. Ann. Math. Statist. 30(3):755–770.Crossref, Google Scholar
Chick SE, Gans N (2009) Economic analysis of simulation selection problems. Management Sci. 55(3):421–437.Link, Google Scholar
Durrett R (1996) Stochastic Calculus: A Practical Introduction, vol. 6 (CRC Press, Boca Raton, FL).Google Scholar
Erev I, Roth AE (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econom. Rev. 88(4):848–881.Google Scholar
Fan L, Glynn PW (2021) Diffusion approximations for Thompson sampling. Preprint, submitted May 19, https://arxiv.org/abs/2105.09232.Google Scholar
Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
Gamarnik D, Zeevi A (2006) Validity of heavy traffic steady-state approximations in generalized Jackson networks. Ann. Appl. Probability 16(1):56–90.Crossref, Google Scholar
Glynn PW (1990) Diffusion approximations. Handbook Oper. Res. Management Sci. 2:145–198.Google Scholar
Hadad V, David A Hirshberg RZ, Wager S, Athey S (2021) Confidence intervals for policy evaluation in adaptive experiments. Proc. National Acad. Sci. USA 118(15):e2014602118.Crossref, Google Scholar
Harrison JM (1988) Brownian models of queueing networks with heterogeneous customer populations. Stochastic Differential Systems, Stochastic Control Theory and Applications (Springer, Berlin), 147–186.Crossref, Google Scholar
Harrison JM, Reinman MI (1981) Reflected brownian motion on an orthant. Ann. Probability 9(2):302–308.Crossref, Google Scholar
Harrison JM, Sunar N (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.Link, Google Scholar
Hill DN, Nassif H, Liu Y, Iyer A, Vishwanathan SVN (2017) An efficient bandit algorithm for realtime multivariate optimization. Proc. 23rd ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 1813–1821.Google Scholar
Hirano K, Porter JR (2009) Asymptotics for statistical treatment rules. Econometrica 77(5):1683–1701.Crossref, Google Scholar
Hirano K, Porter JR (2021) Asymptotic representations for sequential experiments. Proc. Cowles Foundation Conf. on Econometrics (Yale University, New Haven, CT).Google Scholar
Howard SR, Ramdas A, McAuliffe J, Sekhon J (2021) Time-uniform, nonparametric, nonasymptotic confidence sequences. Ann. Statist. 49(2):1055–1080.Crossref, Google Scholar
Iglehart DL, Whitt W (1970) Multiple channel queues in heavy traffic. I. Adv. Appl. Probability 2(1):150–177.Crossref, Google Scholar
Kalvit A, Zeevi A (2021) A closer look at the worst-case behavior of multi-armed bandit algorithms. Adv. Neural Inform. Processing Systems 34:8807–8819.Google Scholar
Kasy M, Sautmann A (2021) Adaptive treatment assignment in experiments for policy choice. Econometrica 89(1):113–132.Crossref, Google Scholar
Kasy M, Teytelboym A (2020) Adaptive targeted infectious disease testing. Oxford Rev. Econom. Policy 36(suppl 1):S77–S93.Crossref, Google Scholar
Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Proc. Internat. Conf. on Algorithmic Learn. Theory (Springer, Berlin), 199–213.Google Scholar
Kelly FP, Laws CN (1993) Dynamic routing in open queueing networks: Brownian models, cut constraints and resource pooling. Queueing Systems 13(1):47–86.Crossref, Google Scholar
Kitagawa T, Tetenov A (2018) Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica 86(2):591–616.Crossref, Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Lattimore T, Szepesvári C (2019) An information-theoretic approach to minimax regret in partial monitoring. 2111–2139.Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Le Cam LM (1972) Limits of experiments. Scott EL, Le Cam LM, Neyman J, eds. Proc. 6th Berkeley Sympos. on Math. Statist. and Probability, vol. 6 (University of California Press, Berkeley–Los Angeles), 245–261.Google Scholar
Liu Y, Devraj AM, Van Roy B, Xu K (2022) Gaussian imagination in bandit learning. Preprint, submitted January 6, https://arxiv.org/abs/2201.01902.Google Scholar
Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, Hoboken, NJ).Google Scholar
Luedtke A, Chambaz A (2020) Performance guarantees for policy learning. Ann. Inst. Henri Poincare Probability Statist. 56(3):2162–2188.Google Scholar
Mannor S, Tsitsiklis JN (2004) The sample complexity of exploration in the multi-armed bandit problem. J. Machine Learn. Res. 5(Jun):623–648.Google Scholar
Massoulié L, Xu K (2018) On the capacity of information processing systems. Oper. Res. 66(2):568–586.Link, Google Scholar
Naghshvar M, Javidi T (2013) Active sequential hypothesis testing. Ann. Statist. 41(6):2703–2738.Crossref, Google Scholar
Reiman MI (1984) Open queueing networks in heavy traffic. Math. Oper. Res. 9(3):441–458.Link, Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (New Series) 58(5):527–535.Crossref, Google Scholar
Russo D (2020) Simple bayesian algorithms for best-arm identification. Oper. Res. 68(6):1625–1647.Link, Google Scholar
Russo D, Van Roy B (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar
Russo D, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A tutorial on Thompson sampling. Foundations Trends Machine Learn. 11(1):1–96.Crossref, Google Scholar
Siegmund D (1985) Sequential Analysis: Tests and Confidence Intervals (Springer Science & Business Media, Boston).Crossref, Google Scholar
Stroock DW, Varadhan SRS (2007) Multidimensional Diffusion Processes (Springer, Berlin).Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
Wager S, Kuang X (2021) Diffusion asymptotics for sequential experiments. Preprint, submitted January 25, https://arxiv.org/abs/2101.09855v3.Google Scholar
Wald A (1947) Sequential Analysis (John Wiley & Sons, New York).Google Scholar
Wang Z, Zenios S (2021) Adaptive design of clinical trials: A sequential learning approach. Preprint, submitted January 27, https://dx.doi.org/10.2139/ssrn.3713924.Google Scholar
Xu K, Yun S-Y (2020) Reinforcement with fading memories. Math. Oper. Res. 45(4):1258–1288.Link, Google Scholar
Zhang K, Janson L, Murphy S (2020) Inference for batched bandits. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems (Curran Associates, Red Hook, NY), 9818–9829.Google Scholar

Volume 70, Issue 10

October 2024

Pages vii-xii, 6483-7343, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:August 23, 2022
Accepted:April 15, 2023
Published Online:December 06, 2023

Cite as

Xu Kuang, Stefan Wager (2023) Weak Signal Asymptotics for Sequentially Randomized Experiments. Management Science 70(10):7024-7041.

https://doi.org/10.1287/mnsc.2023.4964

Keywords

Acknowledgments

The authors are grateful to the referees and editors at Management Science for their detailed comments. The authors are also grateful for valuable feedback and suggestions from Susan Athey, Lin Fan, Peter Glynn, David Goldberg, Keisuke Hirano, Michael Harrison, David Hirshberg, Max Kasy, Emilie Kaufmann, Tor Lattimore, Neil Walton, Jack Porter, Daniel Russo, David Simchi-Levi, as well as seminar participants at a number of venues. An earlier draft of this paper was circulated under the title “Diffusion Asymptotics for Sequential Experiments” (Wager and Kuang 2021). Xu Kuang published under a different full name in earlier versions of this manuscript.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Weak Signal Asymptotics for Sequentially Randomized Experiments

References

Volume 70, Issue 10

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News