Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

Michael Jordan
Michael Jordan
[email protected]
https://orcid.org/0000-0001-8935-817X
Department of Electrical Engineering and Computer Science and Department of Statistics, University of California, Berkeley, Berkeley, California 94720
Search for more papers by this author
,
Tianyi Lin
Tianyi Lin
[email protected]
https://orcid.org/0000-0002-5323-1852
Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027
Search for more papers by this author
,
Zhengyuan Zhou
Corresponding Author
Zhengyuan Zhou
[email protected]
https://orcid.org/0000-0002-0005-9411
Stern School of Business, New York University, New York, New York 10012
Search for more papers by this author

Department of Electrical Engineering and Computer Science and Department of Statistics, University of California, Berkeley, Berkeley, California 94720

Search for more papers by this author

Tianyi Lin

[email protected]

https://orcid.org/0000-0002-5323-1852

Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027

Search for more papers by this author

Zhengyuan Zhou

Corresponding Author

Zhengyuan Zhou

[email protected]

https://orcid.org/0000-0002-0005-9411

Stern School of Business, New York University, New York, New York 10012

Search for more papers by this author

Published Online:23 May 2024https://doi.org/10.1287/opre.2022.0446

References

Abernethy J, Bartlett PL, Rakhlin A, Tewari A (2008) Optimal strategies and minimax lower bounds for online convex games. COLT (Omnipress, Madison, MI), 414–424.Google Scholar
Alacaoglu A, Malitsky Y (2022) Stochastic variance reduction for variational inequality methods. COLT (PMLR, New York), 778–816.Google Scholar
Antonakopoulos K, Belmega V, Mertikopoulos P (2021) Adaptive extra-gradient methods for min-max optimization and games. ICLR, 1–12.Google Scholar
Ba W, Lin T, Zhang J, Zhou Z (2021) Doubly optimal no-regret online learning in strongly monotone games with bandit feedback. Preprint, submitted December 6, https://arxiv.org/abs/2112.02856.Google Scholar
Baby D, Wang YX (2021) Optimal dynamic regret in exp-concave online learning. COLT (PMLR, New York), 359–409.Google Scholar
Baby D, Wang YX (2022) Optimal dynamic regret in proper online learning with strongly convex losses and beyond. AISTATS (PMLR, New York), 1805–1845.Google Scholar
Bach F, Levy KY (2019) A universal algorithm for variational inequalities adaptive to smoothness and noise. COLT (PMLR, New York), 164–194.Google Scholar
Balamurugan P, Bach F (2016) Stochastic variance reduction methods for saddle-point problems. NIPS, 1416–1424.Google Scholar
Balandat M, Krichene W, Tomlin C, Bayen A (2016) Minimizing regret on reflexive Banach spaces and Nash equilibria in continuous zero-sum games. NIPS, 154–162.Google Scholar
Bauschke HH, Combettes PL (2011) Convex Analysis and Monotone Operator Theory in Hilbert Spaces, vol. 408 (Springer, Berlin, Heidelberg).Crossref, Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Bloembergen D, Tuyls K, Hennes D, Kaisers M (2015) Evolutionary dynamics of multi-agent learning: A survey. J. Artificial Intelligence Res. 53:659–697.Crossref, Google Scholar
Blum A (1998) Online algorithms in machine learning. Online Algorithms (Springer, Berlin, Heidelberg), 306–325.Crossref, Google Scholar
Blum A, Mansour Y (2007) From external to internal regret. J. Machine Learn. Res. 8(6):1307–1324.Google Scholar
Bravo M, Leslie D, Mertikopoulos P (2018) Bandit learning in concave N-person games. NeurIPS, 5661–5671.Google Scholar
Bubeck S, Eldan R, Lee YT (2021) Kernel-based methods for bandit convex optimization. J. ACM 68(4):1–35.Crossref, Google Scholar
Cai Y, Oikonomou A, Zheng W (2022) Finite-time last-iterate convergence for learning in multi-player games. NeurIPS, 33904–33919.Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, Learning, and Games (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Chen Y, Lan G, Ouyang Y (2017) Accelerated schemes for a class of variational inequalities. Math. Programming 165:113–149.Crossref, Google Scholar
Cover TM (1991) Universal portfolios. Math. Finance 1(1):1–29.Crossref, Google Scholar
Cover TM (1999) Elements of Information Theory (John Wiley & Sons, Hoboken, NJ).Google Scholar
Cutkosky A (2020a) Better full-matrix regret via parameter-free online learning. NeurIPS, 8836–8846.Google Scholar
Cutkosky A (2020b) Parameter-free, dynamic, and strongly-adaptive online learning. ICML (PMLR, New York), 2250–2259.Google Scholar
Cutkosky A, Orabona F (2018) Black-box reductions for parameter-free online learning in Banach spaces. COLT (PMLR, New York), 1493–1529.Google Scholar
Daniely A, Gonen A, Shalev-Shwartz S (2015) Strongly adaptive online learning. ICML, 1405–1411.Google Scholar
Debreu G (1952) A social equilibrium existence theorem. Proc. Natl. Acad. Sci. USA 38(10):886–893.Crossref, Google Scholar
Defazio A, Bach F, Lacoste-Julien S (2014) SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. NIPS, 1646–1654.Google Scholar
Duchi J, Hazan E, Singer Y (2011) Adaptive subgradient methods for online learning and stochastic optimization. J. Machine Learn. Res. 12(61):2121–2159.Google Scholar
Facchinei F, Pang JS (2007) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer Science & Business Media, Berlin, Heidelberg).Google Scholar
Fernholz ER (2002) Stochastic Portfolio Theory, vol. 48 (Springer Science & Business Media, Berlin, Heidelberg).Crossref, Google Scholar
Flaxman AD, Kalai AT, McMahan HB (2005) Online convex optimization in the bandit setting: Gradient descent without a gradient. SODA, 385–394.Google Scholar
Foster DJ, Rakhlin A, Sridharan K (2015) Adaptive online learning. NeurIPS, 3375–3383.Google Scholar
Foster DJ, Kale S, Mohri M, Sridharan K (2017) Parameter-free online learning via model selection. NeurIPS, 6022–6032.Google Scholar
Garber D, Kretzu B (2022) New projection-free algorithms for online convex optimization with adaptive regret guarantees. COLT (PMLR, New York), 2326–2359.Google Scholar
Goldsmith A (2005) Wireless Communications (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Golowich N, Pattathil S, Daskalakis C (2020) Tight last-iterate convergence rates for no-regret learning in multi-player games. NeurIPS, 20766–20778.Google Scholar
Hazan E (2016) Introduction to Online Convex Optimization (MIT Press, Cambridge, MA), 157–325.Google Scholar
Hazan E, Kakade S (2019) Revisiting the Polyak step size. Preprint, submitted May 1, https://arxiv.org/abs/1905.00313.Google Scholar
Hazan E, Kale S (2014) Beyond the regret minimization barrier: Optimal algorithms for stochastic strongly-convex optimization. J. Machine Learn. Res. 15(1):2489–2512.Google Scholar
Hazan E, Seshadhri C (2007) Adaptive algorithms for online decision problems. ECCC 14(88).Google Scholar
Hazan E, Agarwal A, Kale S (2007) Logarithmic regret algorithms for online convex optimization. Machine Learn. 69(2–3):169–192.Crossref, Google Scholar
Hazan E, Koren T, Levy KY (2014) Logistic regression: Tight bounds for stochastic and online optimization. COLT (PMLR, New York), 197–209.Google Scholar
Hsieh YG, Antonakopoulos K, Mertikopoulos P (2021) Adaptive learning in continuous games: Optimal regret bounds and convergence to Nash equilibrium. COLT (PMLR, New York), 2388–2422.Google Scholar
Huang K, Zhang S (2022) New first-order algorithms for stochastic variational inequalities. SIAM J. Optim. 32(4):2745–2772.Crossref, Google Scholar
Huang K, Wang N, Zhang S (2022) An accelerated variance reduced extrapoint approach to finite-sum VI and optimization. Preprint, submitted November 7, https://arxiv.org/abs/2211.03269.Google Scholar
Huh WT, Rusmevichientong P (2009) A nonparametric asymptotic analysis of inventory planning with censored demand. Math. Oper. Res. 34(1):103–123.Link, Google Scholar
Iusem AN, Jofré A, Oliveira RI, Thompson P (2017) Extragradient method with variance reduction for stochastic variational inequalities. SIAM J. Optim. 27(2):686–724.Crossref, Google Scholar
Iusem AN, Jofré A, Oliveira RI, Thompson P (2019) Variance-based extragradient methods with line search for stochastic variational inequalities. SIAM J. Optim. 29(1):175–206.Crossref, Google Scholar
Jadbabaie A, Rakhlin A, Shahrampour S, Sridharan K (2015) Online optimization: Competing with dynamic comparators. AISTATS (PMLR, New York), 398–406.Google Scholar
Jalilzadeh A, Shanbhag UV (2019) A proximal-point algorithm with variable sample-sizes (PPAWSS) for monotone stochastic variational inequality problems. WSC (IEEE, Piscataway, NJ), 3551–3562.Google Scholar
Jiang H, Xu H (2008) Stochastic approximation approaches to the stochastic variational inequality problem. IEEE Trans. Automatic Control 53(6):1462–1475.Crossref, Google Scholar
Jin Y, Sidford A, Tian K (2022) Sharper rates for separable minimax and finite sum optimization via primal-dual extragradient methods. COLT (PMLR, New York), 4362–4415.Google Scholar
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. NIPS, 315–323.Google Scholar
Juditsky A, Nemirovski A, Tauvel C (2011) Solving variational inequalities with stochastic mirror-prox algorithm. Stochastic Systems 1(1):17–58.Link, Google Scholar
Juditsky A, Rigollet P, Tsybakov AB (2008) Learning by mirror averaging. Ann. Statist. 36(5):2183–2206.Crossref, Google Scholar
Jun KS, Orabona F (2019) Parameter-free online convex optimization with sub-exponential noise. COLT (PMLR, New York), 1802–1823.Google Scholar
Jun KS, Orabona F, Wright S, Willett R (2017a) Improved strongly adaptive online learning using coin betting. AISTATS (PMLR, New York), 943–951.Google Scholar
Jun KS, Orabona F, Wright S, Willett R (2017b) Online learning for changing environments using coin betting. Electronic J. Statist. 11(2):5282–5310.Crossref, Google Scholar
Kalhan DS, Bedi AS, Koppel A, Rajawat K, Hassani H, Gupta AK, Banerjee A (2021) Dynamic online learning via Frank-Wolfe algorithm. IEEE Trans. Signal Processing 69:932–947.Crossref, Google Scholar
Kannan A, Shanbhag UV (2019) Optimal stochastic extragradient schemes for pseudomonotone stochastic variational inequality problems and their variants. Comput. Optim. Appl. 74(3):779–820.Crossref, Google Scholar
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. ICLR, 1–15.Google Scholar
Kivinen J, Warmuth MK (1999) Averaging expert predictions. EuroCOLT (Springer, Berlin, Heidelberg), 153–167.Google Scholar
Koren T, Levy K (2015) Fast rates for exp-concave empirical risk minimization. NeurIPS, 1477–1485.Google Scholar
Koshal J, Nedić A, Shanbhag UV (2012) Regularized iterative stochastic approximation methods for stochastic variational inequality problems. IEEE Trans. Automatic Control 58(3):594–609.Crossref, Google Scholar
Kotsalis G, Lan G, Li T (2022) Simple and optimal methods for stochastic variational inequalities, I: Operator extrapolation. SIAM J. Optim. 32(3):2041–2073.Crossref, Google Scholar
Krichene S, Krichene W, Dong R, Bayen A (2015) Convergence of heterogeneous distributed learning in stochastic routing games. Allerton (IEEE, Piscataway, NJ), 480–487.Google Scholar
Lei L, Jordan MI (2017) Less than a single pass: Stochastically controlled stochastic gradient. AISTATS (PMLR, New York), 148–156.Google Scholar
Lei L, Jordan MI (2020) On the adaptivity of stochastic gradient-based optimization. SIAM J. Optim. 30(2):1473–1500.Crossref, Google Scholar
Levy KY (2017) Online to offline conversions, universality and adaptive minibatch sizes. NeurIPS, 1612–1621.Google Scholar
Levy KY, Yurtsever A, Cevher V (2018) Online adaptive methods, universality and acceleration. NeurIPS, 6501–6510.Google Scholar
Li X, Orabona F (2019) On the convergence of stochastic gradient descent with adaptive stepsizes. AISTATS (PMLR, New York), 983–992.Google Scholar
Lin T, Zhou Z, Mertikopoulos P, Jordan MI (2020) Finite-time last-iterate convergence for multi-agent learning in games. ICML (PMLR, New York), 6161–6171.Google Scholar
Loizou N, Berard H, Gidel G, Mitliagkas I, Lacoste-Julien S (2021) Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity. NeurIPS, 19095–19108.Google Scholar
Lu Z, Brukhim N, Gradu P, Hazan E (2023) Projection-free adaptive regret with membership oracles. ALT (PMLR, New York), 1055–1073.Google Scholar
Luo H, Agarwal A, Cesa-Bianchi N, Langford J (2016) Efficient second order online learning by sketching. NeurIPS, 902–910.Google Scholar
Mahdavi M, Zhang L, Jin R (2015) Lower and upper bounds on the generalization of stochastic exponentially concave optimization. COLT (PMLR, New York), 1305–1320.Google Scholar
Mertikopoulos P, Zhou Z (2019) Learning in games with continuous action sets and unknown payoff functions. Math. Programming 173(1–2):465–507.Crossref, Google Scholar
Mertikopoulos P, Papadimitriou C, Piliouras G (2018) Cycles in adversarial regularized learning. SODA (SIAM, Philadelphia, PA), 2703–2717.Google Scholar
Mertikopoulos P, Lecouat B, Zenati H, Foo CS, Chandrasekhar V, Piliouras G (2019) Optimistic mirror descent in saddle-point problems: Going the extra(-gradient) mile. ICLR, 1–23.Google Scholar
Mokhtari A, Shahrampour S, Jadbabaie A, Ribeiro A (2016) Online optimization in dynamic environments: Improved regret rates for strongly convex problems. CDC (IEEE, Piscataway, NJ), 7195–7201.Google Scholar
Monnot B, Piliouras G (2017) Limits and limitations of no-regret learning in games. Knowledge Engrg. Rev. 32:e21.Crossref, Google Scholar
Mukkamala MC, Hein M (2017) Variants of RMSProp and Adagrad with logarithmic regret bounds. ICML (PMLR, New York), 2545–2553.Google Scholar
Nemirovski A (2004) Prox-method with rate of convergence o(1/t) for variational inequalities with Lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM J. Optim. 15(1):229–251.Crossref, Google Scholar
Nemirovski A, Yudin D (1983) Problem Complexity and Method Efficiency in Optimization (Wiley, Hoboken, NJ).Google Scholar
Nesterov Y (2018) Lectures on Convex Optimization, vol. 137 (Springer, Berlin, Heidelberg).Crossref, Google Scholar
Netessine S, Rudi N (2003) Centralized and competitive inventory models with demand substitution. Oper. Res. 51(2):329–335.Link, Google Scholar
Nguyen LM, van Dijk M, Phan DT, Nguyen PH, Weng TW, Kalagnanam JR (2022) Finite-sum smooth optimization with SARAH. Comput. Optim. Appl. 82(3):561–593.Crossref, Google Scholar
Orabona F, Pál D (2016) Coin betting and parameter-free online learning. NeurIPS, 577–585.Google Scholar
Orda A, Rom R, Shimkin N (1993) Competitive routing in multiuser communication networks. IEEE/ACM Trans. Networking 1(5):510–521.Crossref, Google Scholar
Ordentlich E, Cover TM (1998) The cost of achieving the best portfolio in hindsight. Math. Oper. Res. 23(4):960–982.Link, Google Scholar
Osborne MJ, Rubinstein A (1994) A Course in Game Theory (MIT Press, Cambridge, MA).Google Scholar
Pal S, Wong TKL (2018) Exponentially concave functions and a new information geometry. Ann. Probab. 46(2):1070–1113.Crossref, Google Scholar
Polyak B (1987) Introduction to Optimization (Optimization Software).Google Scholar
Rappaport T (2001) Wireless Communications: Principles and Practice (Prentice Hall PTR, Hoboken, NJ).Google Scholar
Reddi SJ, Kale S, Kumar S (2018) On the convergence of Adam and beyond. ICLR, 1–23.Google Scholar
Rockafellar RT, Sun J (2019) Solving monotone stochastic variational inequalities and complementarity problems by progressive hedging. Math. Programming 174(1–2):453–471.Crossref, Google Scholar
Rockafellar RT, Wets RJB (2017) Stochastic variational inequalities: Single-stage to multistage. Math. Programming 165(1):331–360.Crossref, Google Scholar
Rosen JB (1965) Existence and uniqueness of equilibrium points for concave N-person games. Econometrica 33(3):520–534.Crossref, Google Scholar
Roux NL, Schmidt M, Bach F (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. NIPS, 2663–2671.Google Scholar
Sandholm WH (2015) Population games and deterministic evolutionary dynamics. Handbook of Game Theory with Economic Applications, vol. 4 (Elsevier, Amsterdam), 703–778.Crossref, Google Scholar
Shalev-Shwartz S (2012) Online learning and online convex optimization. Foundations Trends Machine Learn. 4(2):107–194.Google Scholar
Shanbhag UV (2013) Stochastic variational inequality problems: Applications, analysis, and algorithms. Theory Driven by Influential Applications (INFORMS, Catonsville, MD), 71–107.Google Scholar
Shoham Y, Leyton-Brown K (2008) Multiagent Systems: Algorithmic, Game-Theoretic, and Logical Foundations (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Sorin S, Wan C (2016) Finite composite games: Equilibria and dynamics. J. Dynamic Games 3(1):101–120.Crossref, Google Scholar
Tan CW (2014) Wireless network optimization by Perron-Frobenius theory. CISS (IEEE, Piscataway, NJ), 1–6.Google Scholar
Vaswani S, Mishkin A, Laradji I, Schmidt M, Gidel G, Lacoste-Julien S (2019) Painless stochastic gradient: Interpolation, line-search, and convergence rates. NIPS, 3732–3745.Google Scholar
Villani C (2009) Optimal Transport: Old and New, vol. 338 (Springer Science & Business Media, Berlin, Heidelberg).Crossref, Google Scholar
Viossat Y, Zapechelnyuk A (2013) No-regret dynamics and fictitious play. J. Econom. Theory 148(2):825–842.Crossref, Google Scholar
Vovk V (1997) Competitive online linear regression. NIPS, 364–370.Google Scholar
Wan Y, Xue B, Zhang L (2021) Projection-free online learning in dynamic environments. AAAI, 10067–10075.Google Scholar
Wan Y, Zhang L, Song M (2023) Improved dynamic regret for online Frank-Wolfe. Preprint, submitted February 11, https://arxiv.org/abs/2302.05620.Google Scholar
Wang G, Lu S, Cheng Q, Tu W, Zhang L (2020) SAdam: A variant of Adam for strongly convex functions. ICLR, 1–21.Google Scholar
Ward R, Wu X, Bottou L (2019) AdaGrad stepsizes: Sharp convergence over nonconvex landscapes. ICML (PMLR, New York), 6677–6686.Google Scholar
Weeraddana PC, Codreanu M, Latva-aho M, Ephremides A, Fischione C (2012) Weighted Sum-Rate Maximization in Wireless Networks: A Review (Now Foundations and Trends, Piscataway, NJ).Google Scholar
Xu Y, Lin Q, Yang T (2017) Adaptive SVRG methods under error bound conditions with unknown growth parameter. NIPS, 3279–3289.Google Scholar
Yang T, Li Z, Zhang L (2018) A simple analysis for exp-concave empirical minimization with arbitrary convex regularizer. AISTATS, 445–453.Google Scholar
Yang T, Zhang L, Jin R, Yi J (2016) Tracking slowly moving clairvoyant: Optimal dynamic regret of online learning with true and noisy gradient. ICML (PMLR, New York), 449–457.Google Scholar
Yousefian F, Nedić A, Shanbhag UV (2013) A regularized smoothing stochastic approximation (RSSA) algorithm for stochastic variational inequality problems. WSC (IEEE, Piscataway, NJ), 933–944.Google Scholar
Yousefian F, Nedić A, Shanbhag UV (2014) Optimal robust smoothing extragradient algorithms for stochastic variational inequality problems. CDC (IEEE, Piscataway, NJ), 5831–5836.Google Scholar
Yu Y, Lin T, Mazumdar E, Jordan MI (2022) Fast distributionally robust learning with variance-reduced min-max optimization. AISTATS (PMLR, New York), 1219–1250.Google Scholar
Zhang L, Liu TY, Zhou ZH (2019) Adaptive regret of convex and smooth functions. ICML (PMLR, New York), 7414–7423.Google Scholar
Zhang L, Lu S, Zhou ZH (2018a) Adaptive online learning in dynamic environments. NIPS, 1330–1340.Google Scholar
Zhang L, Yang T, Zhou ZH (2018b) Dynamic regret of strongly adaptive methods. ICML (PMLR, New York), 5882–5891.Google Scholar
Zhou Z, Mertikopoulos P, Moustakas AL, Bambos N, Glynn P (2017) Mirror descent learning in continuous games. CDC (IEEE, Piscataway, NJ), 5776–5783.Google Scholar
Zhou Z, Mertikopoulos P, Moustakas AL, Bambos N, Glynn P (2021) Robust power management via learning and game design. Oper. Res. 69(1):331–345.Link, Google Scholar
Zhou Z, Mertikopoulos P, Athey S, Bambos N, Glynn P, Ye Y (2018) Learning in games with lossy feedback. NIPS, 1–11.Google Scholar
Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. ICML, 928–936.Google Scholar
Zou F, Shen L, Jie Z, Zhang W, Liu W (2019) A sufficient condition for convergences of Adam and RMSProp. CVPR, 11127–11135.Google Scholar

Volume 73, Issue 3

May-June 2025

Pages iii-viii, 1151-1722, C2-C3

Article Information

Metrics

Information

Received:August 28, 2022
Accepted:October 20, 2023
Published Online:May 23, 2024

Cite as

Michael Jordan; , Tianyi Lin; , Zhengyuan Zhou (2024) Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback. Operations Research 73(3):1675-1702.

https://doi.org/10.1287/opre.2022.0446

Keywords

Acknowledgments

The authors thank the associate editor and two anonymous reviewers for constructive comments that improve the presentation and results of this paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

References

Volume 73, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News