The Online Saddle Point Problem and Online Convex Optimization with Knapsacks

Published Online:https://doi.org/10.1287/moor.2018.0330

References

  • [1] Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Proc. 21st Annual Conf. Learn. Theory (COLT) (Omnipress, Madison, WI), 263–274.Google Scholar
  • [2] Abernethy J, Lai KA, Levy KY, Wang JK (2018) Faster rates for convex-concave games. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory, vol. 75 (PMLR, New York), 1595–1625.Google Scholar
  • [3] Agrawal S, Devanur NR (2014) Bandits with concave rewards and convex knapsacks. Proc. Fifteenth ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 989–1006.Google Scholar
  • [4] Agrawal S, Devanur NR (2015) Fast algorithms for online stochastic convex programming. Indyk P, ed. Proc. 2015 Annual ACM-SIAM Sympos. Discrete Algorithms (SODA) (SIAM, Philadelphia), 1405–1424.Google Scholar
  • [5] Agrawal S, Wang Z, Ye Y (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.LinkGoogle Scholar
  • [6] Ahmadinejad A, Dehghani S, Hajiaghayi M, Lucier B, Mahini H, Seddighin S (2019) From duels to battlefields: Computing equilibria of blotto and other games. Math. Oper. Res. 44(4):1304–1325.LinkGoogle Scholar
  • [7] Arrow KJ, Hurwicz L, Uzawa H, eds. (1958) Studies in Linear and Non-Linear Programming (Stanford University Press, Stanford, CA).Google Scholar
  • [8] Auer P, Cesa-Bianchi N, Freund Y, Schapire R (1995) Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. 36th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 322–331.Google Scholar
  • [9] Aumann RJ (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55(1):1–18.CrossrefGoogle Scholar
  • [10] Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):13.CrossrefGoogle Scholar
  • [11] Balduzzi D, Racaniere S, Martens J, Foerster J, Tuyls K, Graepel T (2018) The mechanics of n-player differentiable games. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Mach. Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 354–363.Google Scholar
  • [12] Balseiro SR, Gur Y (2019) Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Sci. 65(9):3952–3968.LinkGoogle Scholar
  • [13] Bernstein A, Mannor S, Shimkin N (2010) Online classification with specificity constraints. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 23 (Curran Associates, Inc., Red Hook, NY), 190–198.Google Scholar
  • [14] Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.LinkGoogle Scholar
  • [15] Bowling M (2004) Convergence and no-regret in multiagent learning. Saul L, Weiss Y, Bottou L, eds. Adv. Neural Inform. Processing Systems, vol. 17 (MIT Press, Cambridge, MA), 209–216.Google Scholar
  • [16] Bowling M, Veloso M (2001) Convergence of gradient dynamics with a variable learning rate. Proc. Eighteenth Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers Inc., San Francisco), 27–34.Google Scholar
  • [17] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [18] Bubeck S, Eldan R, Lee YT (2021) Kernel-based methods for bandit convex optimization. J. ACM 68(4):1–35.CrossrefGoogle Scholar
  • [19] Buchbinder N, Naor J (2009) Online primal-dual algorithms for covering and packing. Math. Oper. Res. 34(2):270–286.LinkGoogle Scholar
  • [20] Cardoso AR, Xu H (2019) Risk-averse stochastic convex bandit. Chaudhuri K, Sugiyama M, eds. Proc. Twenty-Second Internat. Conf. Artificial Intelligence and Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 39–47.Google Scholar
  • [21] Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [22] Cesa-Bianchi N, Mansour Y, Stoltz G (2007) Improved second-order bounds for prediction with expert advice. Machine Learn. 66(2–3):321–352.CrossrefGoogle Scholar
  • [23] Chen T, Giannakis GB (2018) Harnessing bandit online learning to low-latency fog computing. 2018 IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, Piscataway, NJ), 6418–6422.Google Scholar
  • [24] Chowdhury SM, Kovenock D, Sheremeta RM (2013) An experimental investigation of colonel blotto games. Econom. Theory 52(3):833–861.CrossrefGoogle Scholar
  • [25] Conitzer V, Sandholm T (2007) Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learn. 67(1–2):23–43.CrossrefGoogle Scholar
  • [26] Cox B, Juditsky A, Nemirovski A (2017) Decomposition techniques for bilinear saddle point problems and variational inequalities with affine monotone operators. J. Optim. Theory Appl. 172(2):402–435.CrossrefGoogle Scholar
  • [27] Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • [28] Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. Sixteenth Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
  • [29] Gupta A, Molinaro M (2016) How the experts algorithm can help solve LPs online. Math. Oper. Res. 41(4):1404–1431.LinkGoogle Scholar
  • [30] Hazan E (2016) Introduction to Online Convex Optimization, Foundations and Trends in Optimization (Now Foundations and Trends, Boston).CrossrefGoogle Scholar
  • [31] Hazan E, Kale S (2014) Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization. J. Machine Learn. Res. 15(1):2489–2512.Google Scholar
  • [32] Hazan E, Li Y (2016) An optimal algorithm for bandit convex optimization. Preprint, submitted March 15, https://arxiv.org/abs/1603.04350.Google Scholar
  • [33] Hazan E, Agarwal A, Kale S (2007) Logarithmic regret algorithms for online convex optimization. Machine Learn. 69(2–3):169–192.CrossrefGoogle Scholar
  • [34] Ho-Nguyen N, Kılınç-Karzan F (2016) The role of flexibility in structure-based acceleration for online convex optimization. Technical report, Carnegie Mellon University, Pittsburgh.Google Scholar
  • [35] Immorlica N, Sankararaman KA, Schapire R, Slivkins A (2019) Adversarial bandits with knapsacks. 60th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 202–219.Google Scholar
  • [36] Jenatton R, Huang J, Archambeau C (2016) Adaptive algorithms for online convex optimization with long-term constraints. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 48 (PMLR, New York), 402–411.Google Scholar
  • [37] Kalai A, Vempala S (2002) Efficient algorithms for universal portfolios. J. Machine Learn. Res. 3(Nov):423–440.Google Scholar
  • [38] Kovenock D, Roberson B (2012) Coalitional Colonel Blotto games with application to the economics of alliances. J. Public Econom. Theory 14(4):653–676.CrossrefGoogle Scholar
  • [39] Lan G, Zhou Z (2016) Algorithms for stochastic optimization with functional or expectation constraints. Preprint, submitted April 13, https://arxiv.org/abs/1604.03887.Google Scholar
  • [40] Laslier JF, Picard N (2002) Distributive politics and electoral competition. J. Econom. Theory 103(1):106–130.CrossrefGoogle Scholar
  • [41] Lu Z, Nemirovski A, Monteiro RD (2007) Large-scale semidefinite programming via a saddle point mirror-prox algorithm. Math. Programming 109(2–3):211–237.CrossrefGoogle Scholar
  • [42] Mahdavi M, Jin R, Yang T (2012) Trading regret for efficiency: Online convex optimization with long term constraints. J. Machine Learn. Res. 13(Sep):2503–2528.Google Scholar
  • [43] Mahdavi M, Yang T, Jin R (2017) Online decision making under stochastic constraints. NIPS Workshop Discrete Optim. Machine Learn.Google Scholar
  • [44] Mannor S, Tsitsiklis JN, Yu JY (2009) Online learning with sample path constraints. J. Machine Learn. Res. 10(Mar):569–590.Google Scholar
  • [45] Myerson RB (1993) Incentives to cultivate favored minorities under alternative electoral systems. Amer. Political Sci. Rev. 87(4):856–869.CrossrefGoogle Scholar
  • [46] Neely MJ, Yu H (2017) Online convex optimization with time-varying constraints. Preprint, submitted February 15, https://arxiv.org/abs/1702.04783.Google Scholar
  • [47] Nemirovski A (2010) Deterministic and randomized first order saddle point methods for large-scale convex optimization. Technical report, Georgia Institute of Technology, Atlanta.Google Scholar
  • [48] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • [49] O’Donoghue B, Lattimore T, Osband I (2020) Matrix games with bandit feedback. Preprint, submitted June 9, https://arxiv.org/abs/2006.05145.Google Scholar
  • [50] Paternain S, Ribeiro A (2017) Online learning of feasible strategies in unknown environments. IEEE Trans. Automatic Control 62(6):2807–2822.Google Scholar
  • [51] Shalev-Shwartz S (2012) Online Learning and Online Convex Optimization, Foundations and Trends in Machine Learning (Now Foundations and Trends, Boston).Google Scholar
  • [52] Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2009) Stochastic convex optimization. Proc. 22nd Annual Conf. Learn. Theory.Google Scholar
  • [53] Singh S, Kearns M, Mansour Y (2000) Nash convergence of gradient dynamics in general-sum games. Proc. Sixteenth Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 541–548.Google Scholar
  • [54] Spall JC (1997) A one-measurement form of simultaneous perturbation stochastic approximation. Automatica J. IFAC 33(1):109–112.CrossrefGoogle Scholar
  • [55] Wu H, Srikant R, Liu X, Jiang C (2015) Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Inc., Red Hook, NY), 433–441.Google Scholar
  • [56] Yu H, Neely MJ (2020) A low complexity algorithm with o(t) regret and finite constraint violations for online convex optimization with long term constraints. J. Machine Learn. Res. 21(1):1–24.Google Scholar
  • [57] Yu H, Neely M, Wei X (2017) Online convex optimization with stochastic constraints. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 1427–1437.Google Scholar
  • [58] Yuan J, Lamperski A (2018) Online convex optimization for cumulative constraints. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 6137–6146.Google Scholar
  • [59] Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. Proc. 20th Internat. Conf. Machine Learn. (AAAI Press, Palo Alto, CA), 928–935.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.