Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling
References
- (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Proc. 24th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook NY), 2312–2320.Google Scholar
- (2013) Thompson sampling for contextual bandits with linear payoffs. PMLR 28(3):127–135.Google Scholar
- (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
- (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2013) Bandits with heavy tail. IEEE Trans. Inform. Theory 59(11):7711–7717.Crossref, Google Scholar
- (2020) The elliptical potential lemma revisited. Preprint, submitted October 20, https://arxiv.org/abs/2010.10182.Google Scholar
- (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2011) Contextual bandits with linear payoff functions. J. Machine Learn. Res. 15:208–214.Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
- (2018) An information-theoretic analysis for Thompson sampling with many actions. Bengio S, Wallach HM, eds. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 4161–4169.Google Scholar
- (2020) On worst-case regret of linear Thompson sampling. Preprint, submitted June 11, https://arxiv.org/abs/2006.06790.Google Scholar
- (2020) An improved regret bound for Thompson sampling in the gaussian linear bandit setting. 2020 IEEE Internat. Sympos. Inform. Theory (ISIT) (IEEE, Piscataway, NJ), 2783–2788.Google Scholar
- (1982) Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Statist. 10(1):154–166.Crossref, Google Scholar
- (2019) Nearly minimax-optimal regret for linearly parameterized bandits. Preprint, submitted March 30, https://arxiv.org/abs/1904.00242Google Scholar
- (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar

