Approximately Stationary Bandits with Knapsacks
Published Online:18 Dec 2025https://doi.org/10.1287/moor.2023.0342
References
- [1] (2014) Bandits with concave rewards and convex knapsacks. Babaioff M, Conitzer V, Easley DA, eds. ACM Conf. Econom. Comput. (ACM, New York), 989–1006.Google Scholar
- [2] (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.Crossref, Google Scholar
- [3] (2013) Bandits with knapsacks. 54th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE Computer Society, Washington, DC), 207–216. Google Scholar
- [4] (2014) Resourceful contextual bandits. Balcan M, Feldman V, Szepesvári C, eds. Proc 27th Conf. Learn. Theory (PMLR, New York), 1109–1134.Google Scholar
- [5] (2017) Learning in repeated auctions with budgets: Regret minimization and equilibrium. Daskalakis C, Babaioff M, Moulin H, eds. Proc. 2017 ACM Conf. Econom. Comput. (ACM, New York), 609. Google Scholar
- [6] (2023) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.Link, Google Scholar
- [7] (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 27 (MIT Press, Cambridge, MA) 199–207.Google Scholar
- [8] (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
- [9] (2012) The best of both worlds: Stochastic and adversarial bandits. Mannor S, Srebro N, Williamson RC, eds. 25th Annual Conf. Learn. Theory (PMLR, New York), 42.1–42.23.Google Scholar
- [10] (2022) Online learning with knapsacks: The best of both worlds. Chaudhuri K, Jegelka S, Song L, Szepesvári C, Niu G, Sabato S, eds. Internat. Conf. Machine Learn. (PMLR, New York), 2767–2783.Google Scholar
- [11] (2023) Best of many worlds guarantees for online learning with knapsacks. Preprint, submitted February 28, https://arxiv.org/abs/2202.13710.Google Scholar
- [12] (2019) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM 66(1):7:1–7:41.Crossref, Google Scholar
- [13] (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.Crossref, Google Scholar
- [14] (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.Crossref, Google Scholar
- [15] (2019) Adversarial bandits with knapsacks. J. ACM 69(6):1–47.Google Scholar
- [16] (2020) Online learning with vector costs and bandits with knapsacks. Abernethy JD, Agarwal S, eds. Conf. Learn. Theory (PMLR, New York), 2286–2305.Google Scholar
- [17] (2022) Non-monotonic resource utilization in the bandits with knapsacks problem. NeurIPS. http://papers.nips.cc/paper\_files/paper/2022/hash/7a62d9a4c03377d1175b8859b4cc16d4-Abstract-Conference.html.Google Scholar
- [18] (2022) Non-stationary bandits with knapsacks. NeurIPS. http://papers.nips.cc/paper\_files/paper/2022/hash/69469da823348084ca8933368ecbf676-Abstract-Conference.html.Google Scholar
- [19] (2018) Stochastic bandits robust to adversarial corruptions. Diakonikolas I, Kempe D, Henzinger M, eds. Proc. 50th Annual ACM SIGACT Sympos. Theory Comput. (ACM, New York), 114–122.Google Scholar
- [20] (2019) Unifying the stochastic and the adversarial bandits with knapsack. Kraus S, ed. Proc. 28th Internat. Joint Conf. Artificial Intelligence (IJCAI, CA), 3311–3317.Google Scholar
- [21] (2018) Combinatorial semi-bandits with knapsacks. Storkey AJ, Pérez-Cruz F, eds. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1760–1770.Google Scholar
- [22] (2020) Book announcement: Introduction to multi-armed bandits. SIGecom Exchange 18(1):28–30.Crossref, Google Scholar
- [23] (2023) Contextual bandits with packing and covering constraints: A modular Lagrangian approach via regression. Proc. Machine Learning Res. 195:1–24, Google Scholar
- [24] (2012) Online learning of rested and restless bandits. IEEE Trans. Inform. Theory 58(8):5588–5611.Crossref, Google Scholar
- [25] (2020) Restless-UCB, an efficient and low-complexity algorithm for online restless bandits. Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 11878–11889.Google Scholar

