Composing Optimized Stepsize Schedules for Gradient Descent

Published Online:

Recent works by Altschuler and Parrilo and Grimmer, Shu, Wang have shown that it is possible to accelerate the convergence of gradient descent on smooth convex functions, even without momentum, just by picking special stepsizes. In this paper, we provide a general theory for composing stepsize schedules, capturing all recent advances in this area and more. We propose three notions of “composable” stepsize schedules with elementary associated composition operations for combining them. From these operations, in addition to recovering recent works, we construct three highly optimized sequences of stepsize schedules. We first construct optimized stepsize schedules of every length, generalizing the exponentially spaced silver stepsizes of Altschuler and Parrilo. We then construct highly optimized stepsize schedules for minimizing final objective gap or gradient norm, improving on prior rates by constants and, more importantly, matching or beating the numerically computed minimax optimal schedules of Das Gupta, Van Parys, Ryu. We conjecture that these schedules are in fact minimax (information theoretic) optimal. Several novel tertiary results follow from our theory, including recovery of the recent dynamic gradient norm minimizing short stepsizes of Rotaru, Glineur, Patrinos and extending them to objective gap minimization.

Funding: Financial support from the Alfred P. Sloan Foundation and the Air Force Office of Scientific Research [Grant FA9550-23-1-0531] is gratefully acknowledged.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.