Robustly Stable Accelerated Momentum Methods with a Near-Optimal Gain and Performance
References
- [1] (2016) On the iteration complexity of oblivious first-order optimization algorithms. Florina BM, Kilian QW, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (PMLR, New York), 908–916.Google Scholar
- [2] (2010) H2 for HIFOO. Preprint, submitted October 7, https://arxiv.org/abs/1010.1442.Google Scholar
- [3] (2015) Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4):2408–2433.Crossref, Google Scholar
- [4] (2019) A universally optimal multistage accelerated stochastic gradient method. Adv. Neural Inform. Processing Systems 32:8523–8534.Google Scholar
- [5] (2020) Robust accelerated gradient methods for smooth strongly convex functions. SIAM J. Optim. 30(1):717–751.Crossref, Google Scholar
- [6] (2009) Estimate sequence methods: Extensions and approximations. Working paper, Institute for Operations Research, ETH, Zurich.Google Scholar
- [7] (2020) Complexity guarantees for Polyak steps with momentum. Jacob A, Shivani A, eds. Proc. 33rd Conf. Learn. Theory, vol. 125 (PMLR, New York), 452–478. Google Scholar
- [8] (2014) A structured pseudospectral method for H∞-norm computation of large-scale descriptor systems. Math. Control Signals Systems 26:303–338.Crossref, Google Scholar
- [9] (2010) Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
- [10] (2015) Convex Optimization Algorithms (Athena Scientific, Belmont, MA)Google Scholar
- [11] (2000) Gradient convergence in gradient methods with errors. SIAM J. Optim. 10(3):627–642.Crossref, Google Scholar
- [12] (1990) A regularity result for the singular values of a transfer matrix and a quadratically convergent algorithm for computing its L∞-norm. Systems Control Lett. 15(1):1–7.Crossref, Google Scholar
- [13] (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [14] (1990) A fast algorithm to compute the H∞-norm of a transfer function matrix. Systems Control Lett. 14(4):287–293.Crossref, Google Scholar
- [15] (2006) HIFOO—A MATLAB package for fixed-order controller design and H∞ optimization. IFAC Proc. Vol. 39(9):339–344.Crossref, Google Scholar
- [16] (2022) Entropic risk-averse generalized momentum methods. Preprint, submitted April 24, https://doi.org/10.48550/arXiv.2204.11292.Google Scholar
- [17] (2019) Accelerated linear convergence of stochastic momentum methods in Wasserstein distances. Kamalika C, Ruslan S, eds. Proc. 36th Internat. Conf. Machine Learn., vol. 97 (PMLR, New York), 891–901.Google Scholar
- [18] (2008) Smooth optimization with approximate gradient. SIAM J. Optim. 19(3):1171–1183.Crossref, Google Scholar
- [19] (2017) On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions. Optim. Lett. 11:1185–1199.Crossref, Google Scholar
- [20] (2020) Worst-case convergence analysis of inexact gradient and Newton methods through semidefinite programming performance estimation. SIAM J. Optim. 30(3):2053–2082.Crossref, Google Scholar
- [21] (2011) Stochastic first order methods in smooth convex optimization. CORE Discussion Paper 2011/70, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
- [22] (2013) Exactness, inexactness and stochasticity in first-order methods for large-scale convex optimization. PhD thesis, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
- [23] (2013) First-order methods with inexact oracle: The strongly convex case. CORE Discussion Paper 2013/16, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
- [24] (2013) Intermediate gradient methods for smooth convex problems with inexact oracle. Technical Report No. CORE-2013017, Center for Operations Research and Econometrics, Universite Catholique de Louvain, Louvain-la-Neuve, Belgium.Google Scholar
- [25] (2014) First-order methods of smooth convex optimization with inexact oracle. Math. Programming 146:37–75.Crossref, Google Scholar
- [26] (2014) Performance of first-order methods for smooth convex minimization: A novel approach. Math. Programming 145(1–2):451–482.Crossref, Google Scholar
- [27] (2022) Robust distributed accelerated stochastic gradient methods for multi-agent networks. J. Machine Learn. Res. 23(1):9893–9988.Google Scholar
- [28] (2018) Analysis of optimization algorithms via integral quadratic constraints: Nonstrongly convex problems. SIAM J. Optim. 28(3):2654–2689.Crossref, Google Scholar
- [29] (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3):A1380–A1405.Crossref, Google Scholar
- [30] (2022) A frequency-domain analysis of inexact gradient methods. Math. Programming 194(1–2):975–1016.Crossref, Google Scholar
- [31] (2012) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM J. Optim. 22(4):1469–1492.Crossref, Google Scholar
- [32] (2013) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: Shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4):2061–2089.Crossref, Google Scholar
- [33] (2019) Understanding the role of momentum in stochastic gradient methods. Adv. Neural Inform. Processing Systems 32:3546–3556.Google Scholar
- [34] (2013) Fast approximation of the H∞ norm via optimization over spectral value sets. SIAM J. Matrix Anal. Appl. 34(2):709–737.Crossref, Google Scholar
- [35] (2009) Multiobjective robust control with HIFOO 2.0. 6th IFAC Sympos. Robust Control Design ROCOND’09 42(6):144–149.Google Scholar
- [36] (2012) Theory and methods for problems arising in robust stability, optimization and quantization. PhD thesis, New York University, New York.Google Scholar
- [37] (2015) A globally convergent incremental Newton method. Math. Programming 151(1):283–313.Crossref, Google Scholar
- [38] (2017) On the convergence rate of incremental aggregated gradient algorithms. SIAM J. Optim. 27(2):1035–1048.Crossref, Google Scholar
- [39] (2019) Convergence rate of incremental gradient and incremental Newton methods. SIAM J. Optim. 29(4):2542–2565.Crossref, Google Scholar
- [40] (2021) Why random reshuffling beats stochastic gradient descent. Math. Programming 186:49–84.Crossref, Google Scholar
- [41] (2014) Robustness versus acceleration. Accessed December 1, 2025, http://blog.mrtz.org/2014/08/18/robustness-versus-acceleration.Google Scholar
- [42] (1993) Spectral value sets: A graphical tool for robustness analysis. Systems Control Lett. 21(2):127–136.Crossref, Google Scholar
- [43] (2005) Mathematical Systems Theory I: Modelling, State Space Analysis, Stability and Robustness, vol. 134 (Springer, Berlin).Crossref, Google Scholar
- [44] (1991) Stability radii of linear discrete-time systems and symplectic pencils. Internat. J. Robust Nonlinear Control 1(2):79–97.Crossref, Google Scholar
- [45] (2017) Dissipativity theory for Nesterov’s accelerated method. Doina P, Yee Whye T, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 1549–1557.Google Scholar
- [46] (2023) On the sample complexity of Lipschitz constant estimation. Preprint, submitted September 19, https://openreview.net/forum?id=UIalYAHdBH.Google Scholar
- [47] (2016) Linear convergence of gradient and proximal-gradient methods under the Polyak–Lojasiewicz condition. Paolo F, Niels L, Giuseppe M, Jilles V, eds. Machine Learn. Knowledge Discovery Databases Eur. Conf. ECML PKDD 2016 (Springer, Berlin), 795–811.Google Scholar
- [48] (2003) Geometry of spectral value sets. PhD thesis, Universitat Bremen, Bremen, Germany.Google Scholar
- [49] (2016) Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1):57–95.Crossref, Google Scholar
- [50] (1958) An Introduction to Fourier Analysis and Generalised Functions (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [51] (1996) H∞-control of discrete-time nonlinear systems. IEEE Trans. Automatic Control 41(4):494–510.Crossref, Google Scholar
- [52] (1963) Une propriété topologique des sous-ensembles analytiques réels. Malgrange B, ed. Les équations aux dérivées partielles (Paris, 1962) (Éditions du Centre National de la Recherche Scientifique, Paris), 87–89.Google Scholar
- [53] (1982) Sur les Trajectoires du Gradient d’une Fonction Analytique (University of Bologna, Bologna, Italy), 115–117.Google Scholar
- [54] (2002) Inverses of 2 × 2 block matrices. Comput. Math. Appl. 43(1–2):119–129.Crossref, Google Scholar
- [55] (1993) Error bounds and convergence analysis of feasible descent methods: A general approach. Ann. Oper. Res. 46(1):157–178.Crossref, Google Scholar
- [56] (2020) Adaptive gradient descent without descent. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (PMLR, New York), 6702–6712.Google Scholar
- [57] (2016) Hybrid expansion–contraction: A robust scaleable method for approximating the H∞ norm. IMA J. Numer. Anal. 36(3):985–1014.Crossref, Google Scholar
- [58] (2021) Robustness of accelerated first-order algorithms for strongly convex optimization problems. IEEE Trans. Automatic Control 66(6):2480–2495.Crossref, Google Scholar
- [59] (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- [60] (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Doklady Akademii Nauk SSSR 269(3):543–547.Google Scholar
- [61] (2004) Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Kluwer Academic, New York).Crossref, Google Scholar
- [62] (2018) Lectures on Convex Optimization, vol. 137 (Springer, Berlin).Crossref, Google Scholar
- [63] (1999) Discrete Time Signal Processing, 2nd ed. (Prentice Hall, Upper Saddle River, NJ).Google Scholar
- [64] (1987) Introduction to Optimization (Optimization Software, New York).Google Scholar
- [65] (1995) A formula for computation of the real stability radius. Automatica 31(6):879–890.Crossref, Google Scholar
- [66] (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. Adv. Neural Inform. Processing Systems 24:1458–1466.Google Scholar
- [67] (2017) Minimizing finite sums with the stochastic average gradient. Math. Programming 162:83–112.Crossref, Google Scholar
- [68] (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
- [69] (2005) Multivariable Feedback Control: Analysis and Design (John Wiley & Sons, Hoboken, NJ).Google Scholar
- [70] (2003) Fourier Analysis: An Introduction, Princeton Lecture Notes in Analysis I (Princeton University Press, Princeton, NJ).Google Scholar
- [71] (2017) Qualitative equivalences of ISS and ℓp-gain stability properties for discrete-time nonlinear systems. Automatica 77:360–369.Crossref, Google Scholar
- [72] (1999) Spectra and pseudospectra. Ainsworth M, Levesley J, Marletta M, eds. The Graduate Student’s Guide to Numerical Analysis ‘98, Lecture Notes from the VIII EPSRC Summer School in Numerical Analysis (Springer, Berlin), 217–250.Crossref, Google Scholar
- [73] (2007) Mathematical Methods for Robust and Nonlinear Control: EPSRC Summer School, Book Series on Control Systems (Springer, Berlin).Crossref, Google Scholar
- [74] (2016) L2-Gain and Passivity Techniques in Nonlinear Control (Springer, Berlin).Google Scholar
- [75] (2021) The speed-robustness trade-off for first-order methods with additive gradient noise. Preprint, submitted September 10, https://arxiv.org/abs/2109.05059.Google Scholar
- [76] (2018) The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Lett. 2(1):49–54.Crossref, Google Scholar
- [77] (1999) Matrix Iterative Analysis, vol. 27 (Springer Science & Business Media, New York).Google Scholar
- [78] (2022) Optimization for Data Analysis (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [79] (2022) SAPD+: An accelerated stochastic method for nonconvex-concave minimax problems. Adv. Neural Inform. Processing Systems 35:21668–21681. Google Scholar
- [80] (1998) Essentials of Robust Control, vol. 104 (Prentice Hall, Upper Saddle River, NJ).Google Scholar
- [81] (1996) Robust and Optimal Control (Prentice Hall, Englewood Cliffs, NJ).Google Scholar

