Composing Optimized Stepsize Schedules for Gradient Descent

Benjamin Grimmer
Benjamin Grimmer
[email protected]
https://orcid.org/0000-0001-6003-4024
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218
Search for more papers by this author
,
Kevin Shu
Kevin Shu
[email protected]
https://orcid.org/0000-0001-6003-4024
Computational and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125
Search for more papers by this author
,
Alex L. Wang
Corresponding Author
Alex L. Wang
[email protected]
https://orcid.org/0000-0002-4293-0359
Daniels School of Business, Purdue University, West Lafeyette, Indiana 47907
Search for more papers by this author

Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218

Search for more papers by this author

Kevin Shu

[email protected]

https://orcid.org/0000-0001-6003-4024

Computational and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125

Search for more papers by this author

Alex L. Wang

Corresponding Author

Alex L. Wang

[email protected]

https://orcid.org/0000-0002-4293-0359

Daniels School of Business, Purdue University, West Lafeyette, Indiana 47907

Search for more papers by this author

Published Online:4 Nov 2025

References

[1] Altschuler JM, Parrilo PA (2025) Acceleration by stepsize hedging: Silver stepsize schedule for smooth convex optimization. Math. Programming 213:1105–1118.Crossref, Google Scholar
[2] Altschuler JM, Parrilo PA (2025) Acceleration by stepsize hedging: Multi-step descent and the silver stepsize schedule. J. ACM 72(2):1–38.Crossref, Google Scholar
[3] Das Gupta S, Van Parys BPG, Ryu EK (2024) Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Math. Program. 204(1–2):567–639. Crossref, Google Scholar
[4] Drori Y, Teboulle M (2014) Performance of first-order methods for smooth convex minimization: A novel approach. Math. Programming 145:451–482.Crossref, Google Scholar
[5] Grimmer B, Shu K, Wang AL (2023) Accelerated gradient descent via long steps. Preprint, submitted September 27, https://arxiv.org/abs/2309.09961.Google Scholar
[6] Grimmer B, Shu K, Wang AL (2024) A strengthened conjecture on the minimax optimal constant stepsize for gradient descent. Preprint, submitted July 16, https://arxiv.org/abs/2407.11739.Google Scholar
[7] Grimmer B, Shu K, Wang AL (2025) Accelerated objective gap and gradient norm convergence for gradient descent via long steps. INFORMS J. Optim. 7(2):156–169.Link, Google Scholar
[8] Kim J (2025) A proof of exact convergence rate of gradient descent. Part II. Performance criterion (f(xn)−f*)/‖x0−x*‖2. Preprint, submitted March 26, https://arxiv.org/abs/2412.04427.Google Scholar
[9] Kim D, Fessler JA (2016) Optimized first-order methods for smooth convex minimization. Math. Programming 159(1):81–107.Crossref, Google Scholar
[10] Kim D, Fessler JA (2021) Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. 188(1):192–219.Crossref, Google Scholar
[11] Kim J, Ozdaglar A, Park C, Ryu E (2023) Time-reversed dissipation induces duality between minimizing gradient norm and function value. Adv. Neural Inform. Processing Systems 36:23389–23440.Google Scholar
[12] Luner A, Grimmer B (2024) On averaging and extrapolation for gradient descent. Preprint, submitted February 19, https://arxiv.org/abs/2402.12493.Google Scholar
[13] Malitsky Y, Mishchenko K (2020) Adaptive gradient descent without descent. Proc. 37th Internat. Conf. Machine Learn. ICML 2020, vol. 119 (PMLR, New York), 6702–6712.Google Scholar
[14] Nesterov Y (1983) A method for solving the convex programming problem with convergence rate o(1/k2). Proc. USSR Acad. Sci. 269:543–547.Google Scholar
[15] Nesterov Y (2014) Introductory Lectures on Convex Optimization: A Basic Course, 1st ed. (Springer Publishing Company, New York).Google Scholar
[16] Rotaru T, Glineur F, Patrinos P (2024) Exact worst-case convergence rates of gradient descent: A complete analysis for all constant stepsizes over nonconvex and convex functions. Preprint, submitted June 25, https://arxiv.org/abs/2406.17506.Google Scholar
[17] Taylor A, Hendrickx J, Glineur F (2017) Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3):1283–1313.Crossref, Google Scholar
[18] Taylor A, Hendrickx J, Glineur F (2017) Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Programming 161:307–345.Crossref, Google Scholar
[19] Teboulle M, Vaisbourd Y (2023) An elementary approach to tight worst case complexity analysis of gradient based methods. Math. Programming 201(1–2):63–96.Crossref, Google Scholar
[20] Wang B, Ma S, Yang J, Zhou D (2024) Relaxed proximal point algorithm: Tight complexity bounds and acceleration without momentum. Preprint, submitted October 11, https://arxiv.org/abs/2410.08890.Google Scholar
[21] Zhang Z, Jiang R (2024) Accelerated gradient descent by concatenation of stepsize schedules. Preprint, submitted October 16, https://arxiv.org/abs/2410.12395.Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:October 28, 2024
Accepted:August 31, 2025
Published Online:November 04, 2025

Cite as

Benjamin Grimmer , Kevin Shu , Alex L. Wang (2025) Composing Optimized Stepsize Schedules for Gradient Descent. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0764

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Composing Optimized Stepsize Schedules for Gradient Descent

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News