[1] Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Proc. 21st Annual Conf. Learn. Theory (COLT) (Omnipress, Madison, WI), 263–274.Google Scholar
[2] Abernethy J, Lai KA, Levy KY, Wang JK (2018) Faster rates for convex-concave games. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory, vol. 75 (PMLR, New York), 1595–1625.Google Scholar
[3] Agrawal S, Devanur NR (2014) Bandits with concave rewards and convex knapsacks. Proc. Fifteenth ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 989–1006.Google Scholar
[4] Agrawal S, Devanur NR (2015) Fast algorithms for online stochastic convex programming. Indyk P, ed. Proc. 2015 Annual ACM-SIAM Sympos. Discrete Algorithms (SODA) (SIAM, Philadelphia), 1405–1424.Google Scholar
[5] Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
[6] Ahmadinejad A, Dehghani S, Hajiaghayi M, Lucier B, Mahini H, Seddighin S (2019) From duels to battlefields: Computing equilibria of blotto and other games. Math. Oper. Res. 44(4):1304–1325.Link, Google Scholar
[7] Arrow KJ, Hurwicz L, Uzawa H, eds. (1958) Studies in Linear and Non-Linear Programming (Stanford University Press, Stanford, CA).Google Scholar
[8] Auer P, Cesa-Bianchi N, Freund Y, Schapire R (1995) Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. 36th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 322–331.Google Scholar
[9] Aumann RJ (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55(1):1–18.Crossref, Google Scholar
[10] Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):13.Crossref, Google Scholar
[11] Balduzzi D, Racaniere S, Martens J, Foerster J, Tuyls K, Graepel T (2018) The mechanics of n-player differentiable games. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Mach. Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 354–363.Google Scholar
[12] Balseiro SR, Gur Y (2019) Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Sci. 65(9):3952–3968.Link, Google Scholar
[13] Bernstein A, Mannor S, Shimkin N (2010) Online classification with specificity constraints. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 23 (Curran Associates, Inc., Red Hook, NY), 190–198.Google Scholar
[14] Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
[15] Bowling M (2004) Convergence and no-regret in multiagent learning. Saul L, Weiss Y, Bottou L, eds. Adv. Neural Inform. Processing Systems, vol. 17 (MIT Press, Cambridge, MA), 209–216.Google Scholar
[16] Bowling M, Veloso M (2001) Convergence of gradient dynamics with a variable learning rate. Proc. Eighteenth Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers Inc., San Francisco), 27–34.Google Scholar
[17] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[18] Bubeck S, Eldan R, Lee YT (2021) Kernel-based methods for bandit convex optimization. J. ACM 68(4):1–35.Crossref, Google Scholar
[19] Buchbinder N, Naor J (2009) Online primal-dual algorithms for covering and packing. Math. Oper. Res. 34(2):270–286.Link, Google Scholar
[20] Cardoso AR, Xu H (2019) Risk-averse stochastic convex bandit. Chaudhuri K, Sugiyama M, eds. Proc. Twenty-Second Internat. Conf. Artificial Intelligence and Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 39–47.Google Scholar
[21] Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[22] Cesa-Bianchi N, Mansour Y, Stoltz G (2007) Improved second-order bounds for prediction with expert advice. Machine Learn. 66(2–3):321–352.Crossref, Google Scholar
[23] Chen T, Giannakis GB (2018) Harnessing bandit online learning to low-latency fog computing. 2018 IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, Piscataway, NJ), 6418–6422.Google Scholar
[24] Chowdhury SM, Kovenock D, Sheremeta RM (2013) An experimental investigation of colonel blotto games. Econom. Theory 52(3):833–861.Crossref, Google Scholar
[25] Conitzer V, Sandholm T (2007) Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learn. 67(1–2):23–43.Crossref, Google Scholar
[26] Cox B, Juditsky A, Nemirovski A (2017) Decomposition techniques for bilinear saddle point problems and variational inequalities with affine monotone operators. J. Optim. Theory Appl. 172(2):402–435.Crossref, Google Scholar
[27] Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
[28] Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. Sixteenth Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
[29] Gupta A, Molinaro M (2016) How the experts algorithm can help solve LPs online. Math. Oper. Res. 41(4):1404–1431.Link, Google Scholar
[30] Hazan E (2016) Introduction to Online Convex Optimization, Foundations and Trends in Optimization (Now Foundations and Trends, Boston).Crossref, Google Scholar
[31] Hazan E, Kale S (2014) Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization. J. Machine Learn. Res. 15(1):2489–2512.Google Scholar
[32] Hazan E, Li Y (2016) An optimal algorithm for bandit convex optimization. Preprint, submitted March 15, https://arxiv.org/abs/1603.04350.Google Scholar
[33] Hazan E, Agarwal A, Kale S (2007) Logarithmic regret algorithms for online convex optimization. Machine Learn. 69(2–3):169–192.Crossref, Google Scholar
[34] Ho-Nguyen N, Kılınç-Karzan F (2016) The role of flexibility in structure-based acceleration for online convex optimization. Technical report, Carnegie Mellon University, Pittsburgh.Google Scholar
[35] Immorlica N, Sankararaman KA, Schapire R, Slivkins A (2019) Adversarial bandits with knapsacks. 60th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 202–219.Google Scholar
[36] Jenatton R, Huang J, Archambeau C (2016) Adaptive algorithms for online convex optimization with long-term constraints. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 48 (PMLR, New York), 402–411.Google Scholar
[37] Kalai A, Vempala S (2002) Efficient algorithms for universal portfolios. J. Machine Learn. Res. 3(Nov):423–440.Google Scholar
[38] Kovenock D, Roberson B (2012) Coalitional Colonel Blotto games with application to the economics of alliances. J. Public Econom. Theory 14(4):653–676.Crossref, Google Scholar
[39] Lan G, Zhou Z (2016) Algorithms for stochastic optimization with functional or expectation constraints. Preprint, submitted April 13, https://arxiv.org/abs/1604.03887.Google Scholar
[40] Laslier JF, Picard N (2002) Distributive politics and electoral competition. J. Econom. Theory 103(1):106–130.Crossref, Google Scholar
[41] Lu Z, Nemirovski A, Monteiro RD (2007) Large-scale semidefinite programming via a saddle point mirror-prox algorithm. Math. Programming 109(2–3):211–237.Crossref, Google Scholar
[42] Mahdavi M, Jin R, Yang T (2012) Trading regret for efficiency: Online convex optimization with long term constraints. J. Machine Learn. Res. 13(Sep):2503–2528.Google Scholar
[43] Mahdavi M, Yang T, Jin R (2017) Online decision making under stochastic constraints. NIPS Workshop Discrete Optim. Machine Learn.Google Scholar
[44] Mannor S, Tsitsiklis JN, Yu JY (2009) Online learning with sample path constraints. J. Machine Learn. Res. 10(Mar):569–590.Google Scholar
[45] Myerson RB (1993) Incentives to cultivate favored minorities under alternative electoral systems. Amer. Political Sci. Rev. 87(4):856–869.Crossref, Google Scholar
[46] Neely MJ, Yu H (2017) Online convex optimization with time-varying constraints. Preprint, submitted February 15, https://arxiv.org/abs/1702.04783.Google Scholar
[47] Nemirovski A (2010) Deterministic and randomized first order saddle point methods for large-scale convex optimization. Technical report, Georgia Institute of Technology, Atlanta.Google Scholar
[48] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
[49] O’Donoghue B, Lattimore T, Osband I (2020) Matrix games with bandit feedback. Preprint, submitted June 9, https://arxiv.org/abs/2006.05145.Google Scholar
[50] Paternain S, Ribeiro A (2017) Online learning of feasible strategies in unknown environments. IEEE Trans. Automatic Control 62(6):2807–2822.Google Scholar
[51] Shalev-Shwartz S (2012) Online Learning and Online Convex Optimization, Foundations and Trends in Machine Learning (Now Foundations and Trends, Boston).Google Scholar
[52] Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2009) Stochastic convex optimization. Proc. 22nd Annual Conf. Learn. Theory.Google Scholar
[53] Singh S, Kearns M, Mansour Y (2000) Nash convergence of gradient dynamics in general-sum games. Proc. Sixteenth Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 541–548.Google Scholar
[54] Spall JC (1997) A one-measurement form of simultaneous perturbation stochastic approximation. Automatica J. IFAC 33(1):109–112.Crossref, Google Scholar
[55] Wu H, Srikant R, Liu X, Jiang C (2015) Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Inc., Red Hook, NY), 433–441.Google Scholar
[56] Yu H, Neely MJ (2020) A low complexity algorithm with o(t) regret and finite constraint violations for online convex optimization with long term constraints. J. Machine Learn. Res. 21(1):1–24.Google Scholar
[57] Yu H, Neely M, Wei X (2017) Online convex optimization with stochastic constraints. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 1427–1437.Google Scholar
[58] Yuan J, Lamperski A (2018) Online convex optimization for cumulative constraints. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 6137–6146.Google Scholar
[59] Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. Proc. 20th Internat. Conf. Machine Learn. (AAAI Press, Palo Alto, CA), 928–935.Google Scholar

cover image Mathematics of Operations Research

Volume 50, Issue 1

February 2025

Pages 1-781 C2

Article Information

Metrics

Information

Received:December 10, 2018
Accepted:November 19, 2023
Published Online:January 12, 2024

Cite as

Adrian Rivera Cardoso; , He Wang; , Huan Xu (2024) The Online Saddle Point Problem and Online Convex Optimization with Knapsacks. Mathematics of Operations Research 50(1):1-39.

https://doi.org/10.1287/moor.2018.0330

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The Online Saddle Point Problem and Online Convex Optimization with Knapsacks

References

Volume 50, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News