Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

Nima Hamidi
Nima Hamidi
[email protected]
Department of Statistics, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Mohsen Bayati
Mohsen Bayati
[email protected]
https://orcid.org/0000-0002-7280-912X
Operations, Information, and Technology, Graduate School of Business, Stanford University, Stanford, California 94305
Search for more papers by this author

Nima Hamidi

[email protected]

Department of Statistics, Stanford University, Stanford, California 94305;

Search for more papers by this author

Mohsen Bayati

[email protected]

https://orcid.org/0000-0002-7280-912X

Operations, Information, and Technology, Graduate School of Business, Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:4 Aug 2022https://doi.org/10.1287/opre.2022.2274

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Proc. 24th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook NY), 2312–2320.Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. PMLR 28(3):127–135.Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
Boyd SP, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Bubeck S, Cesa-Bianchi N, Lugosi G (2013) Bandits with heavy tail. IEEE Trans. Inform. Theory 59(11):7711–7717.Crossref, Google Scholar
Carpentier A, Vernade C, Abbasi-Yadkori Y (2020) The elliptical potential lemma revisited. Preprint, submitted October 20, https://arxiv.org/abs/2010.10182.Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. J. Machine Learn. Res. 15:208–214.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. 21st Annual Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
Dong S, Van Roy B (2018) An information-theoretic analysis for Thompson sampling with many actions. Bengio S, Wallach HM, eds. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 4161–4169.Google Scholar
Hamidi N, Bayati M (2020) On worst-case regret of linear Thompson sampling. Preprint, submitted June 11, https://arxiv.org/abs/2006.06790.Google Scholar
Kalkanli C, Özgür A (2020) An improved regret bound for Thompson sampling in the gaussian linear bandit setting. 2020 IEEE Internat. Sympos. Inform. Theory (ISIT) (IEEE, Piscataway, NJ), 2783–2788.Google Scholar
Lai TL, Ching ZW (1982) Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. Ann. Statist. 10(1):154–166.Crossref, Google Scholar
Li Y, Wang Y, Zhou Y (2019) Nearly minimax-optimal regret for linearly parameterized bandits. Preprint, submitted March 30, https://arxiv.org/abs/1904.00242Google Scholar
Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
Russo D, Van Roy B (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar

Volume 71, Issue 4

July-August 2023

Pages iii-vi, 1021-1439, C2-C3

Article Information

Metrics

Information

Received:March 30, 2021
Accepted:December 21, 2021
Published Online:August 04, 2022

Cite as

Nima Hamidi, Mohsen Bayati (2022) Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling. Operations Research 71(4):1434-1439.

https://doi.org/10.1287/opre.2022.2274

Keywords

Acknowledgments

The authors gratefully acknowledge an insightful suggestion by Ofer Zeitouni.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

References

Volume 71, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News