Explore First, Exploit Next: The True Shape of Regret in Bandit Problems

Published Online:https://doi.org/10.1287/moor.2017.0928

We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain nonasymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback–Leibler divergences. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. The proof techniques come to the essence of the information-theoretic arguments used and they involve no unnecessary complications.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.