Composing Optimized Stepsize Schedules for Gradient Descent

Benjamin Grimmer
Benjamin Grimmer
[email protected]
https://orcid.org/0000-0001-6003-4024
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218
Search for more papers by this author
,
Kevin Shu
Kevin Shu
[email protected]
https://orcid.org/0000-0001-6003-4024
Computational and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125
Search for more papers by this author
,
Alex L. Wang
Corresponding Author
Alex L. Wang
[email protected]
https://orcid.org/0000-0002-4293-0359
Daniels School of Business, Purdue University, West Lafeyette, Indiana 47907
Search for more papers by this author

Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218

Search for more papers by this author

Kevin Shu

[email protected]

https://orcid.org/0000-0001-6003-4024

Computational and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125

Search for more papers by this author

Alex L. Wang

Corresponding Author

Alex L. Wang

[email protected]

https://orcid.org/0000-0002-4293-0359

Daniels School of Business, Purdue University, West Lafeyette, Indiana 47907

Search for more papers by this author

Published Online:4 Nov 2025

Abstract

Recent works by Altschuler and Parrilo and Grimmer, Shu, Wang have shown that it is possible to accelerate the convergence of gradient descent on smooth convex functions, even without momentum, just by picking special stepsizes. In this paper, we provide a general theory for composing stepsize schedules, capturing all recent advances in this area and more. We propose three notions of “composable” stepsize schedules with elementary associated composition operations for combining them. From these operations, in addition to recovering recent works, we construct three highly optimized sequences of stepsize schedules. We first construct optimized stepsize schedules of every length, generalizing the exponentially spaced silver stepsizes of Altschuler and Parrilo. We then construct highly optimized stepsize schedules for minimizing final objective gap or gradient norm, improving on prior rates by constants and, more importantly, matching or beating the numerically computed minimax optimal schedules of Das Gupta, Van Parys, Ryu. We conjecture that these schedules are in fact minimax (information theoretic) optimal. Several novel tertiary results follow from our theory, including recovery of the recent dynamic gradient norm minimizing short stepsizes of Rotaru, Glineur, Patrinos and extending them to objective gap minimization.

Funding: Financial support from the Alfred P. Sloan Foundation and the Air Force Office of Scientific Research [Grant FA9550-23-1-0531] is gratefully acknowledged.

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:October 28, 2024
Accepted:August 31, 2025
Published Online:November 04, 2025

Cite as

Benjamin Grimmer , Kevin Shu , Alex L. Wang (2025) Composing Optimized Stepsize Schedules for Gradient Descent. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0764

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Composing Optimized Stepsize Schedules for Gradient Descent

Abstract

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News