Weak Signal Asymptotics for Sequentially Randomized Experiments
References
- (2017) Near-optimal regret bounds for Thompson sampling. J. ACM 64(5):1–24.Crossref, Google Scholar
- (2021) Diffusion approximations for a class of sequential testing problems. Preprint, submitted February 13, https://arxiv.org/abs/2102.07030.Google Scholar
- (2021) Policy learning with observational data. Econometrica 89(1):133–161.Crossref, Google Scholar
- (2021) Increasing the take-up of long acting reversible contraceptives among adolescents and young women in Cameroon. Development Research Group, The World Bank, Washington, DC.Google Scholar
- (2009) Minimax policies for adversarial and stochastic bandits. COLT, 217–226.Google Scholar
- (2010) UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Periodic Math. Hungary 61(1–2):55–65.Crossref, Google Scholar
- (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- (2014) Prior-free and prior-dependent regret bounds for Thompson sampling. Proc. 48th Annual Conf. on Inform. Sci. and Systems (IEEE, Piscataway, NJ), 1–9.Google Scholar
- (2020) An adaptive targeted field experiment: Job search assistance for refugees in Jordan. Working paper, University of Warwick, Coventry, England.Google Scholar
- (2011) An empirical evaluation of Thompson sampling. Advances in Neural Information Processing Systems, 2249–2257.Google Scholar
- (1959) Sequential design of experiments. Ann. Math. Statist. 30(3):755–770.Crossref, Google Scholar
- (2009) Economic analysis of simulation selection problems. Management Sci. 55(3):421–437.Link, Google Scholar
- (1996) Stochastic Calculus: A Practical Introduction, vol. 6 (CRC Press, Boca Raton, FL).Google Scholar
- (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Amer. Econom. Rev. 88(4):848–881.Google Scholar
- (2021) Diffusion approximations for Thompson sampling. Preprint, submitted May 19, https://arxiv.org/abs/2105.09232.Google Scholar
- (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2006) Validity of heavy traffic steady-state approximations in generalized Jackson networks. Ann. Appl. Probability 16(1):56–90.Crossref, Google Scholar
- (1990) Diffusion approximations. Handbook Oper. Res. Management Sci. 2:145–198.Google Scholar
- (2021) Confidence intervals for policy evaluation in adaptive experiments. Proc. National Acad. Sci. USA 118(15):e2014602118.Crossref, Google Scholar
- (1988) Brownian models of queueing networks with heterogeneous customer populations. Stochastic Differential Systems, Stochastic Control Theory and Applications (Springer, Berlin), 147–186.Crossref, Google Scholar
- (1981) Reflected brownian motion on an orthant. Ann. Probability 9(2):302–308.Crossref, Google Scholar
- (2015) Investment timing with incomplete information and multiple means of learning. Oper. Res. 63(2):442–457.Link, Google Scholar
- (2017) An efficient bandit algorithm for realtime multivariate optimization. Proc. 23rd ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 1813–1821.Google Scholar
- (2009) Asymptotics for statistical treatment rules. Econometrica 77(5):1683–1701.Crossref, Google Scholar
- (2021) Asymptotic representations for sequential experiments. Proc. Cowles Foundation Conf. on Econometrics (Yale University, New Haven, CT).Google Scholar
- (2021) Time-uniform, nonparametric, nonasymptotic confidence sequences. Ann. Statist. 49(2):1055–1080.Crossref, Google Scholar
- (1970) Multiple channel queues in heavy traffic. I. Adv. Appl. Probability 2(1):150–177.Crossref, Google Scholar
- (2021) A closer look at the worst-case behavior of multi-armed bandit algorithms. Adv. Neural Inform. Processing Systems 34:8807–8819.Google Scholar
- (2021) Adaptive treatment assignment in experiments for policy choice. Econometrica 89(1):113–132.Crossref, Google Scholar
- (2020) Adaptive targeted infectious disease testing. Oxford Rev. Econom. Policy 36(suppl 1):S77–S93.Crossref, Google Scholar
- (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Proc. Internat. Conf. on Algorithmic Learn. Theory (Springer, Berlin), 199–213.Google Scholar
- (1993) Dynamic routing in open queueing networks: Brownian models, cut constraints and resource pooling. Queueing Systems 13(1):47–86.Crossref, Google Scholar
- (2018) Who should be treated? empirical welfare maximization methods for treatment choice. Econometrica 86(2):591–616.Crossref, Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2019) An information-theoretic approach to minimax regret in partial monitoring. 2111–2139.Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (1972) Limits of experiments. Scott EL, Le Cam LM, Neyman J, eds. Proc. 6th Berkeley Sympos. on Math. Statist. and Probability, vol. 6 (University of California Press, Berkeley–Los Angeles), 245–261.Google Scholar
- (2022) Gaussian imagination in bandit learning. Preprint, submitted January 6, https://arxiv.org/abs/2201.01902.Google Scholar
- (1959) Individual Choice Behavior: A Theoretical Analysis (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2020) Performance guarantees for policy learning. Ann. Inst. Henri Poincare Probability Statist. 56(3):2162–2188.Google Scholar
- (2004) The sample complexity of exploration in the multi-armed bandit problem. J. Machine Learn. Res. 5(Jun):623–648.Google Scholar
- (2018) On the capacity of information processing systems. Oper. Res. 66(2):568–586.Link, Google Scholar
- (2013) Active sequential hypothesis testing. Ann. Statist. 41(6):2703–2738.Crossref, Google Scholar
- (1984) Open queueing networks in heavy traffic. Math. Oper. Res. 9(3):441–458.Link, Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (New Series) 58(5):527–535.Crossref, Google Scholar
- (2020) Simple bayesian algorithms for best-arm identification. Oper. Res. 68(6):1625–1647.Link, Google Scholar
- (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar
- (2018) A tutorial on Thompson sampling. Foundations Trends Machine Learn. 11(1):1–96.Crossref, Google Scholar
- (1985) Sequential Analysis: Tests and Confidence Intervals (Springer Science & Business Media, Boston).Crossref, Google Scholar
- (2007) Multidimensional Diffusion Processes (Springer, Berlin).Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- Wager S, Kuang X (2021) Diffusion asymptotics for sequential experiments. Preprint, submitted January 25, https://arxiv.org/abs/2101.09855v3.Google Scholar
- (1947) Sequential Analysis (John Wiley & Sons, New York).Google Scholar
- (2021) Adaptive design of clinical trials: A sequential learning approach. Preprint, submitted January 27, https://dx.doi.org/10.2139/ssrn.3713924.Google Scholar
- (2020) Reinforcement with fading memories. Math. Oper. Res. 45(4):1258–1288.Link, Google Scholar
- (2020) Inference for batched bandits. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems (Curran Associates, Red Hook, NY), 9818–9829.Google Scholar

