Weak Signal Asymptotics for Sequentially Randomized Experiments

Published Online:https://doi.org/10.1287/mnsc.2023.4964

References

  • Agrawal S, Goyal N (2017) Near-optimal regret bounds for Thompson sampling. J. ACM 64(5):1–24.CrossrefGoogle Scholar
  • Araman VF, Caldentey R (2021) Diffusion approximations for a class of sequential testing problems. Preprint, submitted February 13, https://arxiv.org/abs/2102.07030.Google Scholar
  • Athey S, Wager S (2021) Policy learning with observational data. Econometrica 89(1):133–161.CrossrefGoogle Scholar
  • Athey S, Baird S, Hadad V, Jamison J, McIntosh C, Özler B, Parisotto L (2021) Increasing the take-up of long acting reversible contraceptives among adolescents and young women in Cameroon. Development Research Group, The World Bank, Washington, DC.Google Scholar
  • Audibert J-Y, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. COLT, 217–226.Google Scholar
  • Auer P, Ortner R (2010) UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodic Math. Hungary 61(1–2):55–65.CrossrefGoogle Scholar
  • Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • Bubeck S, Liu C-Y (2014) Prior-free and prior-dependent regret bounds for Thompson sampling. Proc. 48th Annual Conf. on Inform. Sci. and Systems (IEEE, Piscataway, NJ), 1–9.Google Scholar
  • Caria S, Kasy M, Quinn S, Shami S, Teytelboym A (2020) An adaptive targeted field experiment: Job search assistance for refugees in Jordan. Working paper, University of Warwick, Coventry, England.Google Scholar
  • Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Advances in Neural Information Processing Systems, 2249–2257.Google Scholar
  • Chernoff H (1959) Sequential design of experiments. Ann. Math. Statist. 30(3):755–770.CrossrefGoogle Scholar
  • Chick SE, Gans N (2009) Economic analysis of simulation selection problems. Management Sci. 55(3):421–437.LinkGoogle Scholar
  • Durrett R (1996) Stochastic Calculus: A Practical Introduction, vol. 6 (CRC Press, Boca Raton, FL).Google Scholar
  • Erev I, Roth AE (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econom. Rev. 88(4):848–881.Google Scholar
  • Fan L, Glynn PW (2021) Diffusion approximations for Thompson sampling. Preprint, submitted May 19, https://arxiv.org/abs/2105.09232.Google Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Gamarnik D, Zeevi A (2006) Validity of heavy traffic steady-state approximations in generalized Jackson networks. Ann. Appl. Probability 16(1):56–90.CrossrefGoogle Scholar
  • Glynn PW (1990) Diffusion approximations. Handbook Oper. Res. Management Sci. 2:145–198.Google Scholar
  • Hadad V, David A Hirshberg RZ, Wager S, Athey S (2021) Confidence intervals for policy evaluation in adaptive experiments. Proc. National Acad. Sci. USA 118(15):e2014602118.CrossrefGoogle Scholar
  • Harrison JM (1988) Brownian models of queueing networks with heterogeneous customer populations. Stochastic Differential Systems, Stochastic Control Theory and Applications (Springer, Berlin), 147–186.CrossrefGoogle Scholar
  • Harrison JM, Reinman MI (1981) Reflected brownian motion on an orthant. Ann. Probability 9(2):302–308.CrossrefGoogle Scholar
  • Harrison JM, Sunar N (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.LinkGoogle Scholar
  • Hill DN, Nassif H, Liu Y, Iyer A, Vishwanathan SVN (2017) An efficient bandit algorithm for realtime multivariate optimization. Proc. 23rd ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 1813–1821.Google Scholar
  • Hirano K, Porter JR (2009) Asymptotics for statistical treatment rules. Econometrica 77(5):1683–1701.CrossrefGoogle Scholar
  • Hirano K, Porter JR (2021) Asymptotic representations for sequential experiments. Proc. Cowles Foundation Conf. on Econometrics (Yale University, New Haven, CT).Google Scholar
  • Howard SR, Ramdas A, McAuliffe J, Sekhon J (2021) Time-uniform, nonparametric, nonasymptotic confidence sequences. Ann. Statist. 49(2):1055–1080.CrossrefGoogle Scholar
  • Iglehart DL, Whitt W (1970) Multiple channel queues in heavy traffic. I. Adv. Appl. Probability 2(1):150–177.CrossrefGoogle Scholar
  • Kalvit A, Zeevi A (2021) A closer look at the worst-case behavior of multi-armed bandit algorithms. Adv. Neural Inform. Processing Systems 34:8807–8819.Google Scholar
  • Kasy M, Sautmann A (2021) Adaptive treatment assignment in experiments for policy choice. Econometrica 89(1):113–132.CrossrefGoogle Scholar
  • Kasy M, Teytelboym A (2020) Adaptive targeted infectious disease testing. Oxford Rev. Econom. Policy 36(suppl 1):S77–S93.CrossrefGoogle Scholar
  • Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Proc. Internat. Conf. on Algorithmic Learn. Theory (Springer, Berlin), 199–213.Google Scholar
  • Kelly FP, Laws CN (1993) Dynamic routing in open queueing networks: Brownian models, cut constraints and resource pooling. Queueing Systems 13(1):47–86.CrossrefGoogle Scholar
  • Kitagawa T, Tetenov A (2018) Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica 86(2):591–616.CrossrefGoogle Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2019) An information-theoretic approach to minimax regret in partial monitoring. 2111–2139.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Le Cam LM (1972) Limits of experiments. Scott EL, Le Cam LM, Neyman J, eds. Proc. 6th Berkeley Sympos. on Math. Statist. and Probability, vol. 6 (University of California Press, Berkeley–Los Angeles), 245–261.Google Scholar
  • Liu Y, Devraj AM, Van Roy B, Xu K (2022) Gaussian imagination in bandit learning. Preprint, submitted January 6, https://arxiv.org/abs/2201.01902.Google Scholar
  • Luce RD (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Luedtke A, Chambaz A (2020) Performance guarantees for policy learning. Ann. Inst. Henri Poincare Probability Statist. 56(3):2162–2188.Google Scholar
  • Mannor S, Tsitsiklis JN (2004) The sample complexity of exploration in the multi-armed bandit problem. J. Machine Learn. Res. 5(Jun):623–648.Google Scholar
  • Massoulié L, Xu K (2018) On the capacity of information processing systems. Oper. Res. 66(2):568–586.LinkGoogle Scholar
  • Naghshvar M, Javidi T (2013) Active sequential hypothesis testing. Ann. Statist. 41(6):2703–2738.CrossrefGoogle Scholar
  • Reiman MI (1984) Open queueing networks in heavy traffic. Math. Oper. Res. 9(3):441–458.LinkGoogle Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (New Series) 58(5):527–535.CrossrefGoogle Scholar
  • Russo D (2020) Simple bayesian algorithms for best-arm identification. Oper. Res. 68(6):1625–1647.LinkGoogle Scholar
  • Russo D, Van Roy B (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar
  • Russo D, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A tutorial on Thompson sampling. Foundations Trends Machine Learn. 11(1):1–96.CrossrefGoogle Scholar
  • Siegmund D (1985) Sequential Analysis: Tests and Confidence Intervals (Springer Science & Business Media, Boston).CrossrefGoogle Scholar
  • Stroock DW, Varadhan SRS (2007) Multidimensional Diffusion Processes (Springer, Berlin).Google Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • Wager S, Kuang X (2021) Diffusion asymptotics for sequential experiments. Preprint, submitted January 25, https://arxiv.org/abs/2101.09855v3.Google Scholar
  • Wald A (1947) Sequential Analysis (John Wiley & Sons, New York).Google Scholar
  • Wang Z, Zenios S (2021) Adaptive design of clinical trials: A sequential learning approach. Preprint, submitted January 27, https://dx.doi.org/10.2139/ssrn.3713924.Google Scholar
  • Xu K, Yun S-Y (2020) Reinforcement with fading memories. Math. Oper. Res. 45(4):1258–1288.LinkGoogle Scholar
  • Zhang K, Janson L, Murphy S (2020) Inference for batched bandits. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems (Curran Associates, Red Hook, NY), 9818–9829.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.