Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

Nima Hamidi
Nima Hamidi
[email protected]
Department of Statistics, Stanford University, Stanford, California 94305;
Search for more papers by this author
,
Mohsen Bayati
Mohsen Bayati
[email protected]
https://orcid.org/0000-0002-7280-912X
Operations, Information, and Technology, Graduate School of Business, Stanford University, Stanford, California 94305
Search for more papers by this author

Nima Hamidi

[email protected]

Department of Statistics, Stanford University, Stanford, California 94305;

Search for more papers by this author

Mohsen Bayati

[email protected]

https://orcid.org/0000-0002-7280-912X

Operations, Information, and Technology, Graduate School of Business, Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:4 Aug 2022https://doi.org/10.1287/opre.2022.2274

Abstract

In this note, we introduce a general version of the well-known elliptical potential lemma that is a widely used technique in the analysis of algorithms in sequential learning and decision-making problems. We consider a stochastic linear bandit setting where decision makers sequentially choose among a set of given actions, observe their noisy rewards, and aim to maximize their cumulative expected reward over a decision-making horizon. The elliptical potential lemma is a key tool for quantifying uncertainty in estimating parameters of the reward function, but it requires the noise and the prior distributions to be Gaussian. Our general elliptical potential lemma relaxes this Gaussian requirement, which is a highly nontrivial extension for a number of reasons; unlike the Gaussian case, there is no closed-form solution for the covariance matrix of the posterior distribution, the covariance matrix is not a deterministic function of the actions, and the covariance matrix is not decreasing with respect to the semidefinite inequality. Although this result is of broad interest, we showcase an application of it to prove an improved Bayesian regret bound for the well-known Thompson sampling algorithm in stochastic linear bandits with changing action sets where prior and noise distributions are general. This bound is minimax optimal up to constants.

Funding: This work was supported by the National Science Foundation [Grant 1554140].

Volume 71, Issue 4

July-August 2023

Pages iii-vi, 1021-1439, C2-C3

Article Information

Metrics

Information

Received:March 30, 2021
Accepted:December 21, 2021
Published Online:August 04, 2022

Cite as

Nima Hamidi, Mohsen Bayati (2022) Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling. Operations Research 71(4):1434-1439.

https://doi.org/10.1287/opre.2022.2274

Keywords

Acknowledgments

The authors gratefully acknowledge an insightful suggestion by Ofer Zeitouni.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Technical Note—The Elliptical Potential Lemma for General Distributions with an Application to Linear Thompson Sampling

Abstract

Volume 71, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News