Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Aurélien Garivier
Aurélien Garivier
Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France
Search for more papers by this author
,
Pierre Ménard
Pierre Ménard
Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France
Search for more papers by this author
,
Gilles Stoltz
Corresponding Author
Gilles Stoltz
http://orcid.org/0000-0003-1240-1007
HEC Paris Management Research Group (GREGHEC): HEC Paris—CNRS, 78351 Jouy-en-Josas, France
Search for more papers by this author

Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France

Search for more papers by this author

Pierre Ménard

Institute de Mathématiques de Toulouse (IMT): Université Paul Sabatier—The French National Research Center (CNRS), 31062 Toulouse, France

Search for more papers by this author

Gilles Stoltz

Corresponding Author

Gilles Stoltz

http://orcid.org/0000-0003-1240-1007

HEC Paris Management Research Group (GREGHEC): HEC Paris—CNRS, 78351 Jouy-en-Josas, France

Search for more papers by this author

Published Online:20 Sep 2018https://doi.org/10.1287/moor.2017.0928

Abstract

We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.

cover image Mathematics of Operations Research

Volume 44, Issue 2

May 2019

Pages 377-766, C2

Article Information

Metrics

Information

Received:June 10, 2016
Accepted:December 08, 2017
Published Online:September 20, 2018

Cite as

Aurélien Garivier, Pierre Ménard, Gilles Stoltz (2018) Explore First, Exploit Next: The True Shape of Regret in Bandit Problems. Mathematics of Operations Research 44(2):377-399.

https://doi.org/10.1287/moor.2017.0928

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Abstract

Volume 44, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News