Robustly Stable Accelerated Momentum Methods with a Near-Optimal $L_{2}$ Gain and $H_{\infty}$ Performance

Mert Gürbüzbalaban
Mert Gürbüzbalaban
[email protected]
https://orcid.org/0000-0002-0575-2450
Department of Management Science and Information Systems, Rutgers Business School, Piscataway, New Jersey 08854
Search for more papers by this author

Department of Management Science and Information Systems, Rutgers Business School, Piscataway, New Jersey 08854

Search for more papers by this author

Published Online:5 Dec 2025https://doi.org/10.1287/moor.2023.0321

References

[1] Arjevani Y, Shamir O (2016) On the iteration complexity of oblivious first-order optimization algorithms. Florina BM, Kilian QW, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 908–916.Google Scholar
[2] Arzelier D, Georgia D, Gumussoy S, Henrion D (2010) H2 for HIFOO. Preprint, submitted October 7, https://arxiv.org/abs/1010.1442.Google Scholar
[3] Aujol J-F, Dossal C (2015) Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4):2408–2433.Crossref, Google Scholar
[4] Aybat NS, Fallah A, Gürbüzbalaban M, Ozdaglar A (2019) A universally optimal multistage accelerated stochastic gradient method. Adv. Neural Inform. Processing Systems 32:8523–8534.Google Scholar
[5] Aybat NS, Fallah A, Gürbüzbalaban M, Ozdaglar A (2020) Robust accelerated gradient methods for smooth strongly convex functions. SIAM J. Optim. 30(1):717–751.Crossref, Google Scholar
[6] Baes M (2009) Estimate sequence methods: Extensions and approximations. Working paper, Institute for Operations Research, ETH, Zurich.Google Scholar
[7] Barré M, Taylor A, d’Aspremont A (2020) Complexity guarantees for Polyak steps with momentum. Jacob A, Shivani A, eds. Proc. 33rd Conf. Learn. Theory, vol. 125 (PMLR, New York), 452–478. Google Scholar
[8] Benner P, Voigt M (2014) A structured pseudospectral method for H∞-norm computation of large-scale descriptor systems. Math. Control Signals Systems 26:303–338.Crossref, Google Scholar
[9] Bertsekas DP (2010) Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
[10] Bertsekas D (2015) Convex Optimization Algorithms (Athena Scientific, Belmont, MA)Google Scholar
[11] Bertsekas DP, Tsitsiklis JN (2000) Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3):627–642.Crossref, Google Scholar
[12] Boyd S, Balakrishnan S (1990) A regularity result for the singular values of a transfer matrix and a quadratically convergent algorithm for computing its L∞-norm. Systems Control Lett. 15(1):1–7.Crossref, Google Scholar
[13] Boyd SP, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[14] Bruinsma NA, Steinbuch M (1990) A fast algorithm to compute the H∞-norm of a transfer function matrix. Systems Control Lett. 14(4):287–293.Crossref, Google Scholar
[15] Burke JV, Henrion D, Lewis AS, Overton ML (2006) HIFOO—A MATLAB package for fixed-order controller design and H∞ optimization. IFAC Proc. Vol. 39(9):339–344.Crossref, Google Scholar
[16] Can B, Gürbüzbalaban M (2022) Entropic risk-averse generalized momentum methods. Preprint, submitted April 24, https://doi.org/10.48550/arXiv.2204.11292.Google Scholar
[17] Can B, Gürbüzbalaban M, Zhu L (2019) Accelerated linear convergence of stochastic momentum methods in Wasserstein distances. Kamalika C, Ruslan S, eds. Proc. 36th Internat. Conf. Machine Learn., vol. 97 (PMLR, New York), 891–901.Google Scholar
[18] d’Aspremont A (2008) Smooth optimization with approximate gradient. SIAM J. Optim. 19(3):1171–1183.Crossref, Google Scholar
[19] De Klerk E, Glineur F, Taylor AB (2017) On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11:1185–1199.Crossref, Google Scholar
[20] De Klerk E, Glineur F, Taylor AB (2020) Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM J. Optim. 30(3):2053–2082.Crossref, Google Scholar
[21] Devolder O (2011) Stochastic first order methods in smooth convex optimization. CORE Discussion Paper 2011/70, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
[22] Devolder O (2013) Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization. PhD thesis, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
[23] Devolder O, Glineur F, Nesterov Y (2013) First-order methods with inexact oracle: The strongly convex case. CORE Discussion Paper 2013/16, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
[24] Devolder O, Glineur F, Nesterov Y (2013) Intermediate gradient methods for smooth convex problems with inexact oracle. Technical Report No. CORE-2013017, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
[25] Devolder O, Glineur F, Nesterov Y (2014) First-order methods of smooth convex optimization with inexact oracle. Math. Programming 146:37–75.Crossref, Google Scholar
[26] Drori Y, Teboulle M (2014) Performance of first-order methods for smooth convex minimization: A novel approach. Math. Programming 145(1–2):451–482.Crossref, Google Scholar
[27] Fallah A, Gürbüzbalaban M, Ozdaglar A, Şimşekli U, Zhu L (2022) Robust distributed accelerated stochastic gradient methods for multi-agent networks. J. Machine Learn. Res. 23(1):9893–9988.Google Scholar
[28] Fazlyab M, Ribeiro A, Morari M, Preciado VM (2018) Analysis of optimization algorithms via integral quadratic constraints: Nonstrongly convex problems. SIAM J. Optim. 28(3):2654–2689.Crossref, Google Scholar
[29] Friedlander MP, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3):A1380–A1405.Crossref, Google Scholar
[30] Gannot O (2022) A frequency-domain analysis of inexact gradient methods. Math. Programming 194(1–2):975–1016.Crossref, Google Scholar
[31] Ghadimi S, Lan G (2012) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM J. Optim. 22(4):1469–1492.Crossref, Google Scholar
[32] Ghadimi S, Lan G (2013) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: Shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4):2061–2089.Crossref, Google Scholar
[33] Gitman I, Lang H, Zhang P, Xiao L (2019) Understanding the role of momentum in stochastic gradient methods. Adv. Neural Inform. Processing Systems 32:3546–3556.Google Scholar
[34] Guglielmi N, Gürbüzbalaban M, Overton ML (2013) Fast approximation of the H∞ norm via optimization over spectral value sets. SIAM J. Matrix Anal. Appl. 34(2):709–737.Crossref, Google Scholar
[35] Gumussoy S, Henrion D, Millstone M, Overton ML (2009) Multiobjective robust control with HIFOO 2.0. 6th IFAC Sympos. Robust Control Design ROCOND’09 42(6):144–149.Google Scholar
[36] Gürbüzbalaban M (2012) Theory and methods for problems arising in robust stability, optimization and quantization. PhD thesis, New York University, New York.Google Scholar
[37] Gürbüzbalaban M, Ozdaglar A, Parrilo P (2015) A globally convergent incremental Newton method. Math. Programming 151(1):283–313.Crossref, Google Scholar
[38] Gürbüzbalaban M, Ozdaglar A, Parrilo PA (2017) On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2):1035–1048.Crossref, Google Scholar
[39] Gürbüzbalaban M, Ozdaglar A, Parrilo PA (2019) Convergence rate of incremental gradient and incremental Newton methods. SIAM J. Optim. 29(4):2542–2565.Crossref, Google Scholar
[40] Gürbüzbalaban M, Ozdaglar A, Parrilo PA (2021) Why random reshuffling beats stochastic gradient descent. Math. Programming 186:49–84.Crossref, Google Scholar
[41] Hardt M (2014) Robustness versus acceleration. Accessed December 1, 2025, http://blog.mrtz.org/2014/08/18/robustness-versus-acceleration.Google Scholar
[42] Hinrichsen D, Kelb B (1993) Spectral value sets: A graphical tool for robustness analysis. Systems Control Lett. 21(2):127–136.Crossref, Google Scholar
[43] Hinrichsen D, Pritchard AJ (2005) Mathematical Systems Theory I: Modelling, State Space Analysis, Stability and Robustness, vol. 134 (Springer, Berlin).Crossref, Google Scholar
[44] Hinrichsen D, Son NK (1991) Stability radii of linear discrete-time systems and symplectic pencils. Internat. J. Robust Nonlinear Control 1(2):79–97.Crossref, Google Scholar
[45] Hu B, Lessard L (2017) Dissipativity theory for Nesterov’s accelerated method. Doina P, Yee Whye T, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 1549–1557.Google Scholar
[46] Huang JW, Roberts SJ, Calliess J-P (2023) On the sample complexity of Lipschitz constant estimation. Preprint, submitted September 19, https://openreview.net/forum?id=UIalYAHdBH.Google Scholar
[47] Karimi H, Nutini J, Schmidt M (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak–Lojasiewicz condition. Paolo F, Niels L, Giuseppe M, Jilles V, eds. Machine Learn. Knowledge Discovery Databases Eur. Conf. ECML PKDD 2016 (Springer, Berlin), 795–811.Google Scholar
[48] Karow M (2003) Geometry of spectral value sets. PhD thesis, Universitat Bremen, Bremen, Germany.Google Scholar
[49] Lessard L, Recht B, Packard A (2016) Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1):57–95.Crossref, Google Scholar
[50] Lighthill MJ (1958) An Introduction to Fourier Analysis and Generalised Functions (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[51] Lin W, Byrnes CI (1996) H∞-control of discrete-time nonlinear systems. IEEE Trans. Automatic Control 41(4):494–510.Crossref, Google Scholar
[52] Łojasiewicz S (1963) Une propriété topologique des sous-ensembles analytiques réels. Malgrange B, ed. Les équations aux dérivées partielles (Paris, 1962) (Éditions du Centre National de la Recherche Scientifique, Paris), 87–89.Google Scholar
[53] Łojasiewicz S (1982) Sur les Trajectoires du Gradient d’une Fonction Analytique (University of Bologna, Bologna, Italy), 115–117.Google Scholar
[54] Lu T-T, Shiou S-H (2002) Inverses of 2 × 2 block matrices. Comput. Math. Appl. 43(1–2):119–129.Crossref, Google Scholar
[55] Luo Z-Q, Tseng P (1993) Error bounds and convergence analysis of feasible descent methods: A general approach. Ann. Oper. Res. 46(1):157–178.Crossref, Google Scholar
[56] Malitsky Y, Mishchenko K (2020) Adaptive gradient descent without descent. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 6702–6712.Google Scholar
[57] Mitchell T, Overton ML (2016) Hybrid expansion–contraction: A robust scaleable method for approximating the H∞ norm. IMA J. Numer. Anal. 36(3):985–1014.Crossref, Google Scholar
[58] Mohammadi H, Razaviyayn M, Jovanović MR (2021) Robustness of accelerated first-order algorithms for strongly convex optimization problems. IEEE Trans. Automatic Control 66(6):2480–2495.Crossref, Google Scholar
[59] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
[60] Nesterov Y (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Doklady Akademii Nauk SSSR 269(3):543–547.Google Scholar
[61] Nesterov Y (2004) Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Kluwer Academic, New York).Crossref, Google Scholar
[62] Nesterov Y (2018) Lectures on Convex Optimization, vol. 137 (Springer, Berlin).Crossref, Google Scholar
[63] Oppenheim AV, Schafer RW, Buck JR (1999) Discrete Time Signal Processing, 2nd ed. (Prentice Hall, Upper Saddle River, NJ).Google Scholar
[64] Polyak BT (1987) Introduction to Optimization (Optimization Software, New York).Google Scholar
[65] Qiu L, Bernhardsson B, Rantzer A, Davison EJ, Young PM, Doyle JC (1995) A formula for computation of the real stability radius. Automatica 31(6):879–890.Crossref, Google Scholar
[66] Schmidt M, Le Roux N, Bach F (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inform. Processing Systems 24:1458–1466.Google Scholar
[67] Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math. Programming 162:83–112.Crossref, Google Scholar
[68] Shapiro A, Dentcheva D, Ruszczynski A (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
[69] Skogestad S, Postlethwaite I (2005) Multivariable Feedback Control: Analysis and Design (John Wiley & Sons, Hoboken, NJ).Google Scholar
[70] Stein EM, Shakarchi R (2003) Fourier Analysis: An Introduction, Princeton Lecture Notes in Analysis I (Princeton University Press, Princeton, NJ).Google Scholar
[71] Tran DN, Kellett CM, Dower PM (2017) Qualitative equivalences of ISS and ℓp-gain stability properties for discrete-time nonlinear systems. Automatica 77:360–369.Crossref, Google Scholar
[72] Trefethen LN (1999) Spectra and pseudospectra. Ainsworth M, Levesley J, Marletta M, eds. The Graduate Student’s Guide to Numerical Analysis ‘98, Lecture Notes from the VIII EPSRC Summer School in Numerical Analysis (Springer, Berlin), 217–250.Crossref, Google Scholar
[73] Turner MC, Bates DG (2007) Mathematical Methods for Robust and Nonlinear Control: EPSRC Summer School, Book Series on Control Systems (Springer, Berlin).Crossref, Google Scholar
[74] van der Schaft A (2016) L2-Gain and Passivity Techniques in Nonlinear Control (Springer, Berlin).Google Scholar
[75] Van Scoy B, Lessard L (2021) The speed-robustness trade-off for first-order methods with additive gradient noise. Preprint, submitted September 10, https://arxiv.org/abs/2109.05059.Google Scholar
[76] Van Scoy B, Freeman RA, Lynch KM (2018) The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Lett. 2(1):49–54.Crossref, Google Scholar
[77] Varga RS (1999) Matrix Iterative Analysis, vol. 27 (Springer Science & Business Media, New York).Google Scholar
[78] Wright SJ, Recht B (2022) Optimization for Data Analysis (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[79] Zhang X, Serhat Aybat N, Gürbüzbalaban M (2022) SAPD+: An accelerated stochastic method for nonconvex-concave minimax problems. Adv. Neural Inform. Processing Systems 35:21668–21681. Google Scholar
[80] Zhou K, Doyle JC (1998) Essentials of Robust Control, vol. 104 (Prentice Hall, Upper Saddle River, NJ).Google Scholar
[81] Zhou K, Doyle JC, Glover K (1996) Robust and Optimal Control (Prentice Hall, Englewood Cliffs, NJ).Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:October 20, 2023
Accepted:July 22, 2025
Published Online:December 05, 2025

Cite as

Mert Gürbüzbalaban (2025) Robustly Stable Accelerated Momentum Methods with a Near-Optimal

L_{2}

Gain and

H_{\infty}

Performance. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2023.0321

Keywords

Acknowledgments

The author expresses gratitude to the associate editor and two anonymous referees for their valuable suggestions and comments that improved the quality of this work. The author also acknowledge his former student Bugra Can for pointing out a useful lemma for the inversion of block matrices and for helpful discussions about the manuscript.

This paper is dedicated to the 70th birthday of Michael Overton (New York University), who has made key contributions to the computation of various robustness measures for linear dynamical systems, including the stability radius and the H∞ norm.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Robustly Stable Accelerated Momentum Methods with a Near-Optimal $L_{2}$ Gain and $H_{\infty}$ Performance

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News

Available Issues

Available Issues

Robustly Stable Accelerated Momentum Methods with a Near-Optimal L2 Gain and H∞ Performance

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Robustly Stable Accelerated Momentum Methods with a Near-Optimal $L_{2}$ Gain and $H_{\infty}$ Performance