Approximately Stationary Bandits with Knapsacks

Published Online:https://doi.org/10.1287/moor.2023.0342

References

  • [1] Agrawal S, Devanur NR (2014) Bandits with concave rewards and convex knapsacks. Babaioff M, Conitzer V, Easley DA, eds. ACM Conf. Econom. Comput. (ACM, New York), 989–1006.Google Scholar
  • [2] Auer P, Cesa-Bianchi N, Freund Y, Schapire RE (2002) The nonstochastic multiarmed bandit problem. SIAM J. Comput. 32(1):48–77.CrossrefGoogle Scholar
  • [3] Badanidiyuru A, Kleinberg R, Slivkins A (2013) Bandits with knapsacks. 54th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE Computer Society, Washington, DC), 207–216. Google Scholar
  • [4] Badanidiyuru A, Langford J, Slivkins A (2014) Resourceful contextual bandits. Balcan M, Feldman V, Szepesvári C, eds. Proc 27th Conf. Learn. Theory (PMLR, New York), 1109–1134.Google Scholar
  • [5] Balseiro SR, Gur Y (2017) Learning in repeated auctions with budgets: Regret minimization and equilibrium. Daskalakis C, Babaioff M, Moulin H, eds. Proc. 2017 ACM Conf. Econom. Comput. (ACM, New York), 609. Google Scholar
  • [6] Balseiro SR, Lu H, Mirrokni V (2023) The best of many worlds: Dual mirror descent for online allocation problems. Oper. Res. 71(1):101–119.LinkGoogle Scholar
  • [7] Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 27 (MIT Press, Cambridge, MA) 199–207.Google Scholar
  • [8] Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • [9] Bubeck S, Slivkins A (2012) The best of both worlds: Stochastic and adversarial bandits. Mannor S, Srebro N, Williamson RC, eds. 25th Annual Conf. Learn. Theory (PMLR, New York), 42.1–42.23.Google Scholar
  • [10] Castiglioni M, Celli A, Kroer C (2022) Online learning with knapsacks: The best of both worlds. Chaudhuri K, Jegelka S, Song L, Szepesvári C, Niu G, Sabato S, eds. Internat. Conf. Machine Learn. (PMLR, New York), 2767–2783.Google Scholar
  • [11] Celli A, Castiglioni M, Kroer C (2023) Best of many worlds guarantees for online learning with knapsacks. Preprint, submitted February 28, https://arxiv.org/abs/2202.13710.Google Scholar
  • [12] Devanur NR, Jain K, Sivan B, Wilkens CA (2019) Near optimal online algorithms and fast approximation algorithms for resource allocation problems. J. ACM 66(1):7:1–7:41.CrossrefGoogle Scholar
  • [13] Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.CrossrefGoogle Scholar
  • [14] Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.CrossrefGoogle Scholar
  • [15] Immorlica N, Sankararaman KA, Schapire RE, Slivkins A (2019) Adversarial bandits with knapsacks. J. ACM 69(6):1–47.Google Scholar
  • [16] Kesselheim T, Singla S (2020) Online learning with vector costs and bandits with knapsacks. Abernethy JD, Agarwal S, eds. Conf. Learn. Theory (PMLR, New York), 2286–2305.Google Scholar
  • [17] Kumar R, Kleinberg R (2022) Non-monotonic resource utilization in the bandits with knapsacks problem. NeurIPS. http://papers.nips.cc/paper\_files/paper/2022/hash/7a62d9a4c03377d1175b8859b4cc16d4-Abstract-Conference.html.Google Scholar
  • [18] Liu S, Jiang J, Li X (2022) Non-stationary bandits with knapsacks. NeurIPS. http://papers.nips.cc/paper\_files/paper/2022/hash/69469da823348084ca8933368ecbf676-Abstract-Conference.html.Google Scholar
  • [19] Lykouris T, Mirrokni VS, Leme RP (2018) Stochastic bandits robust to adversarial corruptions. Diakonikolas I, Kempe D, Henzinger M, eds. Proc. 50th Annual ACM SIGACT Sympos. Theory Comput. (ACM, New York), 114–122.Google Scholar
  • [20] Rangi A, Franceschetti M, Tran-Thanh L (2019) Unifying the stochastic and the adversarial bandits with knapsack. Kraus S, ed. Proc. 28th Internat. Joint Conf. Artificial Intelligence (IJCAI, CA), 3311–3317.Google Scholar
  • [21] Sankararaman KA, Slivkins A (2018) Combinatorial semi-bandits with knapsacks. Storkey AJ, Pérez-Cruz F, eds. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1760–1770.Google Scholar
  • [22] Slivkins A (2020) Book announcement: Introduction to multi-armed bandits. SIGecom Exchange 18(1):28–30.CrossrefGoogle Scholar
  • [23] Slivkins A, Sankararaman KA, Foster DJ (2023) Contextual bandits with packing and covering constraints: A modular Lagrangian approach via regression. Proc. Machine Learning Res. 195:1–24, Google Scholar
  • [24] Tekin C, Liu M (2012) Online learning of rested and restless bandits. IEEE Trans. Inform. Theory 58(8):5588–5611.CrossrefGoogle Scholar
  • [25] Wang S, Huang L, Lui JCS (2020) Restless-UCB, an efficient and low-complexity algorithm for online restless bandits. Larochelle H, Ranzato M, Hadsell R, Balcan M, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 11878–11889.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.