Robustly Stable Accelerated Momentum Methods with a Near-Optimal L2 Gain and H Performance

Published Online:https://doi.org/10.1287/moor.2023.0321

References

  • [1] Arjevani Y, Shamir O (2016) On the iteration complexity of oblivious first-order optimization algorithms. Florina BM, Kilian QW, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 908–916.Google Scholar
  • [2] Arzelier D, Georgia D, Gumussoy S, Henrion D (2010) H2 for HIFOO. Preprint, submitted October 7, https://arxiv.org/abs/1010.1442.Google Scholar
  • [3] Aujol J-F, Dossal C (2015) Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4):2408–2433.CrossrefGoogle Scholar
  • [4] Aybat NS, Fallah A, Gürbüzbalaban M, Ozdaglar A (2019) A universally optimal multistage accelerated stochastic gradient method. Adv. Neural Inform. Processing Systems 32:8523–8534.Google Scholar
  • [5] Aybat NS, Fallah A, Gürbüzbalaban M, Ozdaglar A (2020) Robust accelerated gradient methods for smooth strongly convex functions. SIAM J. Optim. 30(1):717–751.CrossrefGoogle Scholar
  • [6] Baes M (2009) Estimate sequence methods: Extensions and approximations. Working paper, Institute for Operations Research, ETH, Zurich.Google Scholar
  • [7] Barré M, Taylor A, d’Aspremont A (2020) Complexity guarantees for Polyak steps with momentum. Jacob A, Shivani A, eds. Proc. 33rd Conf. Learn. Theory, vol. 125 (PMLR, New York), 452–478. Google Scholar
  • [8] Benner P, Voigt M (2014) A structured pseudospectral method for H∞-norm computation of large-scale descriptor systems. Math. Control Signals Systems 26:303–338.CrossrefGoogle Scholar
  • [9] Bertsekas DP (2010) Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
  • [10] Bertsekas D (2015) Convex Optimization Algorithms (Athena Scientific, Belmont, MA)Google Scholar
  • [11] Bertsekas DP, Tsitsiklis JN (2000) Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3):627–642.CrossrefGoogle Scholar
  • [12] Boyd S, Balakrishnan S (1990) A regularity result for the singular values of a transfer matrix and a quadratically convergent algorithm for computing its L∞-norm. Systems Control Lett. 15(1):1–7.CrossrefGoogle Scholar
  • [13] Boyd SP, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [14] Bruinsma NA, Steinbuch M (1990) A fast algorithm to compute the H∞-norm of a transfer function matrix. Systems Control Lett. 14(4):287–293.CrossrefGoogle Scholar
  • [15] Burke JV, Henrion D, Lewis AS, Overton ML (2006) HIFOO—A MATLAB package for fixed-order controller design and H∞ optimization. IFAC Proc. Vol. 39(9):339–344.CrossrefGoogle Scholar
  • [16] Can B, Gürbüzbalaban M (2022) Entropic risk-averse generalized momentum methods. Preprint, submitted April 24, https://doi.org/10.48550/arXiv.2204.11292.Google Scholar
  • [17] Can B, Gürbüzbalaban M, Zhu L (2019) Accelerated linear convergence of stochastic momentum methods in Wasserstein distances. Kamalika C, Ruslan S, eds. Proc. 36th Internat. Conf. Machine Learn., vol. 97 (PMLR, New York), 891–901.Google Scholar
  • [18] d’Aspremont A (2008) Smooth optimization with approximate gradient. SIAM J. Optim. 19(3):1171–1183.CrossrefGoogle Scholar
  • [19] De Klerk E, Glineur F, Taylor AB (2017) On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11:1185–1199.CrossrefGoogle Scholar
  • [20] De Klerk E, Glineur F, Taylor AB (2020) Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM J. Optim. 30(3):2053–2082.CrossrefGoogle Scholar
  • [21] Devolder O (2011) Stochastic first order methods in smooth convex optimization. CORE Discussion Paper 2011/70, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
  • [22] Devolder O (2013) Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization. PhD thesis, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
  • [23] Devolder O, Glineur F, Nesterov Y (2013) First-order methods with inexact oracle: The strongly convex case. CORE Discussion Paper 2013/16, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
  • [24] Devolder O, Glineur F, Nesterov Y (2013) Intermediate gradient methods for smooth convex problems with inexact oracle. Technical Report No. CORE-2013017, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
  • [25] Devolder O, Glineur F, Nesterov Y (2014) First-order methods of smooth convex optimization with inexact oracle. Math. Programming 146:37–75.CrossrefGoogle Scholar
  • [26] Drori Y, Teboulle M (2014) Performance of first-order methods for smooth convex minimization: A novel approach. Math. Programming 145(1–2):451–482.CrossrefGoogle Scholar
  • [27] Fallah A, Gürbüzbalaban M, Ozdaglar A, Şimşekli U, Zhu L (2022) Robust distributed accelerated stochastic gradient methods for multi-agent networks. J. Machine Learn. Res. 23(1):9893–9988.Google Scholar
  • [28] Fazlyab M, Ribeiro A, Morari M, Preciado VM (2018) Analysis of optimization algorithms via integral quadratic constraints: Nonstrongly convex problems. SIAM J. Optim. 28(3):2654–2689.CrossrefGoogle Scholar
  • [29] Friedlander MP, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3):A1380–A1405.CrossrefGoogle Scholar
  • [30] Gannot O (2022) A frequency-domain analysis of inexact gradient methods. Math. Programming 194(1–2):975–1016.CrossrefGoogle Scholar
  • [31] Ghadimi S, Lan G (2012) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM J. Optim. 22(4):1469–1492.CrossrefGoogle Scholar
  • [32] Ghadimi S, Lan G (2013) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: Shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4):2061–2089.CrossrefGoogle Scholar
  • [33] Gitman I, Lang H, Zhang P, Xiao L (2019) Understanding the role of momentum in stochastic gradient methods. Adv. Neural Inform. Processing Systems 32:3546–3556.Google Scholar
  • [34] Guglielmi N, Gürbüzbalaban M, Overton ML (2013) Fast approximation of the H∞ norm via optimization over spectral value sets. SIAM J. Matrix Anal. Appl. 34(2):709–737.CrossrefGoogle Scholar
  • [35] Gumussoy S, Henrion D, Millstone M, Overton ML (2009) Multiobjective robust control with HIFOO 2.0. 6th IFAC Sympos. Robust Control Design ROCOND’09 42(6):144–149.Google Scholar
  • [36] Gürbüzbalaban M (2012) Theory and methods for problems arising in robust stability, optimization and quantization. PhD thesis, New York University, New York.Google Scholar
  • [37] Gürbüzbalaban M, Ozdaglar A, Parrilo P (2015) A globally convergent incremental Newton method. Math. Programming 151(1):283–313.CrossrefGoogle Scholar
  • [38] Gürbüzbalaban M, Ozdaglar A, Parrilo PA (2017) On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2):1035–1048.CrossrefGoogle Scholar
  • [39] Gürbüzbalaban M, Ozdaglar A, Parrilo PA (2019) Convergence rate of incremental gradient and incremental Newton methods. SIAM J. Optim. 29(4):2542–2565.CrossrefGoogle Scholar
  • [40] Gürbüzbalaban M, Ozdaglar A, Parrilo PA (2021) Why random reshuffling beats stochastic gradient descent. Math. Programming 186:49–84.CrossrefGoogle Scholar
  • [41] Hardt M (2014) Robustness versus acceleration. Accessed December 1, 2025, http://blog.mrtz.org/2014/08/18/robustness-versus-acceleration.Google Scholar
  • [42] Hinrichsen D, Kelb B (1993) Spectral value sets: A graphical tool for robustness analysis. Systems Control Lett. 21(2):127–136.CrossrefGoogle Scholar
  • [43] Hinrichsen D, Pritchard AJ (2005) Mathematical Systems Theory I: Modelling, State Space Analysis, Stability and Robustness, vol. 134 (Springer, Berlin).CrossrefGoogle Scholar
  • [44] Hinrichsen D, Son NK (1991) Stability radii of linear discrete-time systems and symplectic pencils. Internat. J. Robust Nonlinear Control 1(2):79–97.CrossrefGoogle Scholar
  • [45] Hu B, Lessard L (2017) Dissipativity theory for Nesterov’s accelerated method. Doina P, Yee Whye T, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 1549–1557.Google Scholar
  • [46] Huang JW, Roberts SJ, Calliess J-P (2023) On the sample complexity of Lipschitz constant estimation. Preprint, submitted September 19, https://openreview.net/forum?id=UIalYAHdBH.Google Scholar
  • [47] Karimi H, Nutini J, Schmidt M (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak–Lojasiewicz condition. Paolo F, Niels L, Giuseppe M, Jilles V, eds. Machine Learn. Knowledge Discovery Databases Eur. Conf. ECML PKDD 2016 (Springer, Berlin), 795–811.Google Scholar
  • [48] Karow M (2003) Geometry of spectral value sets. PhD thesis, Universitat Bremen, Bremen, Germany.Google Scholar
  • [49] Lessard L, Recht B, Packard A (2016) Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1):57–95.CrossrefGoogle Scholar
  • [50] Lighthill MJ (1958) An Introduction to Fourier Analysis and Generalised Functions (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [51] Lin W, Byrnes CI (1996) H∞-control of discrete-time nonlinear systems. IEEE Trans. Automatic Control 41(4):494–510.CrossrefGoogle Scholar
  • [52] Łojasiewicz S (1963) Une propriété topologique des sous-ensembles analytiques réels. Malgrange B, ed. Les équations aux dérivées partielles (Paris, 1962) (Éditions du Centre National de la Recherche Scientifique, Paris), 87–89.Google Scholar
  • [53] Łojasiewicz S (1982) Sur les Trajectoires du Gradient d’une Fonction Analytique (University of Bologna, Bologna, Italy), 115–117.Google Scholar
  • [54] Lu T-T, Shiou S-H (2002) Inverses of 2 × 2 block matrices. Comput. Math. Appl. 43(1–2):119–129.CrossrefGoogle Scholar
  • [55] Luo Z-Q, Tseng P (1993) Error bounds and convergence analysis of feasible descent methods: A general approach. Ann. Oper. Res. 46(1):157–178.CrossrefGoogle Scholar
  • [56] Malitsky Y, Mishchenko K (2020) Adaptive gradient descent without descent. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 6702–6712.Google Scholar
  • [57] Mitchell T, Overton ML (2016) Hybrid expansion–contraction: A robust scaleable method for approximating the H∞ norm. IMA J. Numer. Anal. 36(3):985–1014.CrossrefGoogle Scholar
  • [58] Mohammadi H, Razaviyayn M, Jovanović MR (2021) Robustness of accelerated first-order algorithms for strongly convex optimization problems. IEEE Trans. Automatic Control 66(6):2480–2495.CrossrefGoogle Scholar
  • [59] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • [60] Nesterov Y (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Doklady Akademii Nauk SSSR 269(3):543–547.Google Scholar
  • [61] Nesterov Y (2004) Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Kluwer Academic, New York).CrossrefGoogle Scholar
  • [62] Nesterov Y (2018) Lectures on Convex Optimization, vol. 137 (Springer, Berlin).CrossrefGoogle Scholar
  • [63] Oppenheim AV, Schafer RW, Buck JR (1999) Discrete Time Signal Processing, 2nd ed. (Prentice Hall, Upper Saddle River, NJ).Google Scholar
  • [64] Polyak BT (1987) Introduction to Optimization (Optimization Software, New York).Google Scholar
  • [65] Qiu L, Bernhardsson B, Rantzer A, Davison EJ, Young PM, Doyle JC (1995) A formula for computation of the real stability radius. Automatica 31(6):879–890.CrossrefGoogle Scholar
  • [66] Schmidt M, Le Roux N, Bach F (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inform. Processing Systems 24:1458–1466.Google Scholar
  • [67] Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math. Programming 162:83–112.CrossrefGoogle Scholar
  • [68] Shapiro A, Dentcheva D, Ruszczynski A (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [69] Skogestad S, Postlethwaite I (2005) Multivariable Feedback Control: Analysis and Design (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • [70] Stein EM, Shakarchi R (2003) Fourier Analysis: An Introduction, Princeton Lecture Notes in Analysis I (Princeton University Press, Princeton, NJ).Google Scholar
  • [71] Tran DN, Kellett CM, Dower PM (2017) Qualitative equivalences of ISS and ℓp-gain stability properties for discrete-time nonlinear systems. Automatica 77:360–369.CrossrefGoogle Scholar
  • [72] Trefethen LN (1999) Spectra and pseudospectra. Ainsworth M, Levesley J, Marletta M, eds. The Graduate Student’s Guide to Numerical Analysis ‘98, Lecture Notes from the VIII EPSRC Summer School in Numerical Analysis (Springer, Berlin), 217–250.CrossrefGoogle Scholar
  • [73] Turner MC, Bates DG (2007) Mathematical Methods for Robust and Nonlinear Control: EPSRC Summer School, Book Series on Control Systems (Springer, Berlin).CrossrefGoogle Scholar
  • [74] van der Schaft A (2016) L2-Gain and Passivity Techniques in Nonlinear Control (Springer, Berlin).Google Scholar
  • [75] Van Scoy B, Lessard L (2021) The speed-robustness trade-off for first-order methods with additive gradient noise. Preprint, submitted September 10, https://arxiv.org/abs/2109.05059.Google Scholar
  • [76] Van Scoy B, Freeman RA, Lynch KM (2018) The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Lett. 2(1):49–54.CrossrefGoogle Scholar
  • [77] Varga RS (1999) Matrix Iterative Analysis, vol. 27 (Springer Science & Business Media, New York).Google Scholar
  • [78] Wright SJ, Recht B (2022) Optimization for Data Analysis (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [79] Zhang X, Serhat Aybat N, Gürbüzbalaban M (2022) SAPD+: An accelerated stochastic method for nonconvex-concave minimax problems. Adv. Neural Inform. Processing Systems 35:21668–21681. Google Scholar
  • [80] Zhou K, Doyle JC (1998) Essentials of Robust Control, vol. 104 (Prentice Hall, Upper Saddle River, NJ).Google Scholar
  • [81] Zhou K, Doyle JC, Glover K (1996) Robust and Optimal Control (Prentice Hall, Englewood Cliffs, NJ).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.