Technical Note—On Adaptivity in Nonstationary Stochastic Optimization with Bandit Feedback
Abstract
In this paper, we study the nonstationary stochastic optimization problem with bandit feedback and dynamic regret measures. The seminal work of Besbes et al. (2015) shows that, when aggregated function changes are known a priori, a simple restarting algorithm attains the optimal dynamic regret. In this work, we design a stochastic optimization algorithm with fixed step sizes, which, combined with the multiscale sampling framework in existing research, achieves the optimal dynamic regret in nonstationary stochastic optimization without prior knowledge of function changing budget, thereby closing a question that has been open for a while. We also establish an additional result showing that any algorithm achieving good regret against stationary benchmarks with high probability could be automatically converted to an algorithm that achieves good regret against dynamic benchmarks (for problems that admit regret against stationary benchmarks in fully adversarial settings, a dynamic regret of is expected), which is potentially applicable to a wide class of bandit convex optimization and other types of bandit algorithms.

