The Online Saddle Point Problem and Online Convex Optimization with Knapsacks
Published Online:12 Jan 2024https://doi.org/10.1287/moor.2018.0330
References
- [1] (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Proc. 21st Annual Conf. Learn. Theory (COLT) (Omnipress, Madison, WI), 263–274.Google Scholar
- [2] (2018) Faster rates for convex-concave games. Bubeck S, Perchet V, Rigollet P, eds. Proc. 31st Conf. Learn. Theory, vol. 75 (PMLR, New York), 1595–1625.Google Scholar
- [3] (2014) Bandits with concave rewards and convex knapsacks. Proc. Fifteenth ACM Conf. Econom. Comput. (Association for Computing Machinery, New York), 989–1006.Google Scholar
- [4] (2015) Fast algorithms for online stochastic convex programming. Indyk P, ed. Proc. 2015 Annual ACM-SIAM Sympos. Discrete Algorithms (SODA) (SIAM, Philadelphia), 1405–1424.Google Scholar
- [5] (2014) A dynamic near-optimal algorithm for online linear programming. Oper. Res. 62(4):876–890.Link, Google Scholar
- [6] (2019) From duels to battlefields: Computing equilibria of blotto and other games. Math. Oper. Res. 44(4):1304–1325.Link, Google Scholar
- [7] Arrow KJ, Hurwicz L, Uzawa H, eds. (1958) Studies in Linear and Non-Linear Programming (Stanford University Press, Stanford, CA).Google Scholar
- [8] (1995) Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proc. 36th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 322–331.Google Scholar
- [9] (1987) Correlated equilibrium as an expression of Bayesian rationality. Econometrica 55(1):1–18.Crossref, Google Scholar
- [10] (2018) Bandits with knapsacks. J. ACM 65(3):13.Crossref, Google Scholar
- [11] (2018) The mechanics of n-player differentiable games. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Mach. Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 354–363.Google Scholar
- [12] (2019) Learning in repeated auctions with budgets: Regret minimization and equilibrium. Management Sci. 65(9):3952–3968.Link, Google Scholar
- [13] (2010) Online classification with specificity constraints. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 23 (Curran Associates, Inc., Red Hook, NY), 190–198.Google Scholar
- [14] (2012) Blind network revenue management. Oper. Res. 60(6):1537–1550.Link, Google Scholar
- [15] (2004) Convergence and no-regret in multiagent learning. Saul L, Weiss Y, Bottou L, eds. Adv. Neural Inform. Processing Systems, vol. 17 (MIT Press, Cambridge, MA), 209–216.Google Scholar
- [16] (2001) Convergence of gradient dynamics with a variable learning rate. Proc. Eighteenth Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers Inc., San Francisco), 27–34.Google Scholar
- [17] (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [18] (2021) Kernel-based methods for bandit convex optimization. J. ACM 68(4):1–35.Crossref, Google Scholar
- [19] (2009) Online primal-dual algorithms for covering and packing. Math. Oper. Res. 34(2):270–286.Link, Google Scholar
- [20] (2019) Risk-averse stochastic convex bandit. Chaudhuri K, Sugiyama M, eds. Proc. Twenty-Second Internat. Conf. Artificial Intelligence and Statist., Proceedings of Machine Learning Research, vol. 89 (PMLR, New York), 39–47.Google Scholar
- [21] (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [22] (2007) Improved second-order bounds for prediction with expert advice. Machine Learn. 66(2–3):321–352.Crossref, Google Scholar
- [23] (2018) Harnessing bandit online learning to low-latency fog computing. 2018 IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP) (IEEE, Piscataway, NJ), 6418–6422.Google Scholar
- [24] (2013) An experimental investigation of colonel blotto games. Econom. Theory 52(3):833–861.Crossref, Google Scholar
- [25] (2007) Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learn. 67(1–2):23–43.Crossref, Google Scholar
- [26] (2017) Decomposition techniques for bilinear saddle point problems and variational inequalities with affine monotone operators. J. Optim. Theory Appl. 172(2):402–435.Crossref, Google Scholar
- [27] (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- [28] (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. Proc. Sixteenth Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 385–394.Google Scholar
- [29] (2016) How the experts algorithm can help solve LPs online. Math. Oper. Res. 41(4):1404–1431.Link, Google Scholar
- [30] (2016) Introduction to Online Convex Optimization, Foundations and Trends in Optimization (Now Foundations and Trends, Boston).Crossref, Google Scholar
- [31] (2014) Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization. J. Machine Learn. Res. 15(1):2489–2512.Google Scholar
- [32] (2016) An optimal algorithm for bandit convex optimization. Preprint, submitted March 15, https://arxiv.org/abs/1603.04350.Google Scholar
- [33] (2007) Logarithmic regret algorithms for online convex optimization. Machine Learn. 69(2–3):169–192.Crossref, Google Scholar
- [34] (2016) The role of flexibility in structure-based acceleration for online convex optimization. Technical report, Carnegie Mellon University, Pittsburgh.Google Scholar
- [35] (2019) Adversarial bandits with knapsacks. 60th Annual Sympos. Foundations Comput. Sci. (FOCS) (IEEE, Piscataway, NJ), 202–219.Google Scholar
- [36] (2016) Adaptive algorithms for online convex optimization with long-term constraints. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 48 (PMLR, New York), 402–411.Google Scholar
- [37] (2002) Efficient algorithms for universal portfolios. J. Machine Learn. Res. 3(Nov):423–440.Google Scholar
- [38] (2012) Coalitional Colonel Blotto games with application to the economics of alliances. J. Public Econom. Theory 14(4):653–676.Crossref, Google Scholar
- [39] (2016) Algorithms for stochastic optimization with functional or expectation constraints. Preprint, submitted April 13, https://arxiv.org/abs/1604.03887.Google Scholar
- [40] (2002) Distributive politics and electoral competition. J. Econom. Theory 103(1):106–130.Crossref, Google Scholar
- [41] (2007) Large-scale semidefinite programming via a saddle point mirror-prox algorithm. Math. Programming 109(2–3):211–237.Crossref, Google Scholar
- [42] (2012) Trading regret for efficiency: Online convex optimization with long term constraints. J. Machine Learn. Res. 13(Sep):2503–2528.Google Scholar
- [43] (2017) Online decision making under stochastic constraints. NIPS Workshop Discrete Optim. Machine Learn.Google Scholar
- [44] (2009) Online learning with sample path constraints. J. Machine Learn. Res. 10(Mar):569–590.Google Scholar
- [45] (1993) Incentives to cultivate favored minorities under alternative electoral systems. Amer. Political Sci. Rev. 87(4):856–869.Crossref, Google Scholar
- [46] (2017) Online convex optimization with time-varying constraints. Preprint, submitted February 15, https://arxiv.org/abs/1702.04783.Google Scholar
- [47] (2010) Deterministic and randomized first order saddle point methods for large-scale convex optimization. Technical report, Georgia Institute of Technology, Atlanta.Google Scholar
- [48] (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- [49] (2020) Matrix games with bandit feedback. Preprint, submitted June 9, https://arxiv.org/abs/2006.05145.Google Scholar
- [50] (2017) Online learning of feasible strategies in unknown environments. IEEE Trans. Automatic Control 62(6):2807–2822.Google Scholar
- [51] (2012) Online Learning and Online Convex Optimization, Foundations and Trends in Machine Learning (Now Foundations and Trends, Boston).Google Scholar
- [52] (2009) Stochastic convex optimization. Proc. 22nd Annual Conf. Learn. Theory.Google Scholar
- [53] (2000) Nash convergence of gradient dynamics in general-sum games. Proc. Sixteenth Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann Publishers Inc., San Francisco), 541–548.Google Scholar
- [54] (1997) A one-measurement form of simultaneous perturbation stochastic approximation. Automatica J. IFAC 33(1):109–112.Crossref, Google Scholar
- [55] (2015) Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Inc., Red Hook, NY), 433–441.Google Scholar
- [56] (2020) A low complexity algorithm with o(t) regret and finite constraint violations for online convex optimization with long term constraints. J. Machine Learn. Res. 21(1):1–24.Google Scholar
- [57] (2017) Online convex optimization with stochastic constraints. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates, Inc., Red Hook, NY), 1427–1437.Google Scholar
- [58] (2018) Online convex optimization for cumulative constraints. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 6137–6146.Google Scholar
- [59] (2003) Online convex programming and generalized infinitesimal gradient ascent. Proc. 20th Internat. Conf. Machine Learn. (AAAI Press, Palo Alto, CA), 928–935.Google Scholar

