Technical Note—On Adaptivity in Nonstationary Stochastic Optimization with Bandit Feedback
References
- (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. Tauman Kalai A, Mohri M, eds. Proc. 23rd Annual Conf. Learning Theory (COLT 2010) (OmniPress, Haifa, Israel), 28–40.Google Scholar
- (2013) Stochastic convex optimization with bandit feedback. SIAM J. Optim. 23(1):213–240.Crossref, Google Scholar
- (2022) Learning in structured MDPs with convex cost functions: Improved regret bounds for inventory management. Oper. Res. 70(3):1646–1664.Google Scholar
- (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- (1967) Weighted sums of certain dependent random variables. Tohoku Math. J. (2) 19(3):357–367.Crossref, Google Scholar
- (2022) Optimal dynamic regret in proper online learning with strongly convex losses and beyond. Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist. (AISTATS 2022) (Proceedings of Machine Learning Research, New York), 1805–1845.Google Scholar
- (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer Science & Business Media, Berlin).Google Scholar
- (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
- (2011) Contextual bandit algorithms with supervised learning guarantees. Gordon G, Dunson D, Dudik M, eds. Proc. 14th Internat. Conf. Artificial Intelligence and Statist. (AISTATS 2011) (Proceedings of Machine Learning Research, New York), 19–26.Google Scholar
- (2012) Toward minimax policies for online linear optimization with bandit feedback. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Annual Conf. Learning Theory (COLT 2012) (Proceedings of Machine Learning Research, New York), 41.1–41.14.Google Scholar
- (2021) Kernel-based methods for bandit convex optimization. J. ACM 68(4):1–35.Crossref, Google Scholar
- (2019) Non-stationary reinforcement learning: The blessing of (more) optimism. Preprint, submitted June 7, https://arxiv.org/abs/1906.02922.Google Scholar
- (2022) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Link, Google Scholar
- (2015) Combinatorial bandits revisited. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Proc. 29th Conf. Neural Inform. Processing Systems (Advances in Neural Information Processing Systems, San Diego, CA), 2116–2124.Google Scholar
- (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Buchsbaum A, ed. Proc. 16th Annual ACM-SIAM Sympos. Discrete Algorithms (SODA 2005) (Association for Computing Machinery, Inc., New York), 385–394.Google Scholar
- (2009) Efficient learning algorithms for changing environments. Bottou L, Littman M, eds. Proc. 26th Internat. Conf. Machine Learn. (ICML 2009) (ICML, Madison, WI), 393–400.Google Scholar
- (1963) Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58(301):13–30.Google Scholar
- (2022) Adversarial bandits with knapsacks. J. ACM 69(6):1–47.Crossref, Google Scholar
- (2003) Stochastic approximation. Ann. Statist. 31(2):391–406.Crossref, Google Scholar
- (2020) Improved regret for zeroth-order adversarial bandit convex optimisation. Math. Statist. Learn. 2(3):311–334.Crossref, Google Scholar
- (2021) Improved regret for zeroth-order stochastic convex bandits. Belkin M, Kpotufe S, eds. Proc. 34th Annual Conf. Learning Theory (COLT 2021) (Proceedings of Machine Learning Research, New York), 2938–2964.Google Scholar
- (2022) Non-stationary continuum-armed bandits for online hyperparameter optimization. Akoglu L, Dong XL, Tang J, eds. Proc. 15th ACM Internat. WSDM Conf. (WSDM 2022) (Association for Computing Machinery, New York), 618–627.Google Scholar
- (2020) Model-free non-stationary RL: Near-optimal regret and applications in multi-agent RL and inventory control. Preprint, submitted October 7, https://arxiv.org/abs/2010.03161.Google Scholar
- (1983) Problem complexity and method efficiency in optimization.Google Scholar
- (2021) Adversarial dueling bandits. Florina Balcan M, Zhu J, Arora R, eds. Proc. 38th Internat. Conf. Machine Learn. (ICML 2021) (Proceedings of Machine Learning Research, New York), 9235–9244.Google Scholar
- (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Proc. 24th Conf. Neural Inform. Processing Systems (Advances in Neural Information Processing Systems, San Diego), 1458–1466.Google Scholar
- (2018) Stochastic zeroth-order optimization in high dimensions. Storkey A, Perez-Cruz F, eds. Proc. 21st Internat. Conf. Artificial Intelligence Statist. (AISTATS 2018) (Proceedings of Machine Learning Research, New York), 1356–1365.Google Scholar
- (2021) Non-stationary reinforcement learning without prior knowledge: An optimal black-box approach. Belkin M, Kpotufe S, eds. Proc. 34th Annual Conf. Learn. Theory (COLT 2021) (Proceedings of Machine Learning Research, New York), 4300–4354.Google Scholar
- (2020) Minimizing dynamic regret and adaptive regret simultaneously. Chiappa S, Calandra R, eds. Proc. 23rd Internat. Conf. Artificial Intelligence Statist. (AISTATS 2020) (Proceedings of Machine Learning Research, New York), 309–319.Google Scholar
- (2018) Dynamic regret of strongly adaptive methods. Bach F, Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn. (ICML 2018) (Proceedings of Machine Learning Research, New York), 5882–5891.Google Scholar
- (2022) Non-stationary online learning with memory and non-stochastic control. Camps-Valls G, Ruiz FJR, Valera I, eds. Proc. 25th Internat. Conf. Artificial Intelligence Statist. (AISTATS 2022) (Proceedings of Machine Learning Research, New York), 1805–1845.Google Scholar
- (2020) Dynamic regret of convex and smooth functions. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. 34th Conf. Neural Inform. Processing Systems, vol. 33 (Neural Information Processing Systems, San Diego), 12510–12520.Google Scholar

