Robustly Stable Accelerated Momentum Methods with a Near-Optimal Gain and Performance
Abstract
We consider the problem of minimizing a strongly convex smooth function where the gradients are subject to additive worst-case deterministic errors that are square summable. We study the trade-offs between the convergence rate and robustness to gradient errors when designing the parameters of a first-order algorithm. We focus on a general class of momentum methods (GMM) with constant step size and two momentum parameters that can recover gradient descent (GD), Nesterov’s accelerated gradient (NAG), the heavy-ball (HB) method, and the triple-momentum method as special cases. We measure the robustness of an algorithm in terms of the cumulative suboptimality in function values over the iterations normalized by the squared norm of the gradient errors. This quantity can be interpreted as the (squared) gain of a dynamical system that represents the GMM iterations where the input is the gradient error sequence and the output is a weighted distance to the optimum. For quadratic objectives, we compute the gain by explicitly leveraging its representation as the norm of the GMM system in the frequency domain and construct gradient errors that lead to worst-case performance explicitly. We also study the stability of GMM with respect to multiplicative errors by characterizing the structured real and stability radius of the GMM system through their connections to the norm. This allows us to compare GD, HB, and NAG methods in terms of robustness and argue that HB is not as robust as NAG, despite being the fastest in terms of the rate. We then develop the robustly stable HB method that can be faster than NAG while being at the best robustness level possible. We also propose the robustly stable GD that is the fastest version of GD with constant step size while being at the best robustness level. In addition, we extend our framework to general strongly convex smooth objectives, providing nonasymptotic rate results for inexact GMMs and bounds on the gain where we can choose the GMM parameters to systematically trade off the rate to robustness in a computationally tractable framework. Finally, we discuss how our results can be extended to more general noise that can be stochastic.
Funding: This work was funded in part by the Office of Naval Research [Grants N00014-21-1-2244 and N00014-24-12628] and the National Science Foundation [Grant DMS-2053485].
Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2023.0321.

