On Averaging and Extrapolation for Gradient Descent

Alan Luner
Corresponding Author
Alan Luner
[email protected]
https://orcid.org/0009-0009-6989-7438
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218
Search for more papers by this author
,
Benjamin Grimmer
Benjamin Grimmer
[email protected]
https://orcid.org/0000-0002-7003-8448
Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218
Search for more papers by this author

Alan Luner

Corresponding Author

Alan Luner

[email protected]

https://orcid.org/0009-0009-6989-7438

Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218

Search for more papers by this author

Benjamin Grimmer

[email protected]

https://orcid.org/0000-0002-7003-8448

Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland 21218

Search for more papers by this author

Published Online:17 Jul 2025https://doi.org/10.1287/moor.2024.0449

References

[1] Altschuler J (2018) Greed, hedging, and acceleration in convex optimization. PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
[2] Altschuler JM, Parrilo PA (2024) Acceleration by stepsize hedging: Silver stepsize schedule for smooth convex optimization. Math. Programming, ePub ahead of print November 25, https://doi.org/10.1007/s10107-024-02164-2.Crossref, Google Scholar
[3] Anderson DG (1965) Iterative procedures for nonlinear integral equations. J. ACM 12(4):547–560.Crossref, Google Scholar
[4] Barré M, Taylor A, Bach F (2023) Principled analyses and design of first-order methods with inexact proximal operators. Math. Programming 201:185–230.Crossref, Google Scholar
[5] Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):1–27.Crossref, Google Scholar
[6] Das Gupta S, Van Parys BPG, Ryu EK (2024) Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization method. Math. Programming 204:567–639.Crossref, Google Scholar
[7] Drori Y (2014) Contributions to the complexity analysis of optimization algorithms. PhD thesis, Tel-Aviv University, Tel Aviv, Israel.Google Scholar
[8] Drori Y (2017) The exact information-based complexity of smooth convex minimization. J. Complexity 39:1–16.Crossref, Google Scholar
[9] Drori Y, Taylor AB (2020) Efficient first-order methods for convex minimization: A constructive approach. Math. Programming 184:183–220.Crossref, Google Scholar
[10] Drori Y, Teboulle M (2014) Performance of first-order methods for smooth convex minimization: A novel approach. Math. Programming 145:451–482.Crossref, Google Scholar
[11] Dunning I, Huchette J, Lubin M (2017) JuMP: A modeling language for mathematical optimization. SIAM Rev. 59(2):295–320.Crossref, Google Scholar
[12] Ghadimi E, Feyzmahdavian HR, Johansson M (2015) Global convergence of the heavy-ball method for convex optimization. 2015 Eur. Control Conf. (ECC) (IEEE, New York), 310–315.Google Scholar
[13] Goujaud B, Moucer C, Glineur F, Hendrickx J, Taylor A, Dieuleveut A (2024) PEPit: Computer-assisted worst-case analyses of first-order optimization methods in Python. Math. Programming 16:337–367.Crossref, Google Scholar
[14] Grimmer B (2024) Provably faster gradient descent via long steps. SIAM J. Optim. 34(3):2588–2608.Crossref, Google Scholar
[15] Grimmer B, Li D (2024) Some primal-dual theory for subgradient methods for strongly convex optimization. Preprint, submitted June 27, https://arxiv.org/abs/2305.17323.Google Scholar
[16] Grimmer B, Shu K, Wang AL (2023) Accelerated gradient descent via long steps. Preprint, submitted September 18, https://arxiv.org/abs/2309.09961.Google Scholar
[17] Gu G, Yang J (2020) Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problems. SIAM J. Optim. 30(3):1905–1921.Crossref, Google Scholar
[18] Gustavsson E, Patriksson M, Strömberg AB (2015) Primal convergence from dual subgradient methods for convex optimization. Math. Programming 150:365–390.Crossref, Google Scholar
[19] Kim D (2021) Accelerated proximal point method for maximally monotone operators. Math. Programming 190:57–87.Crossref, Google Scholar
[20] Kim J (2024) A proof of the exact convergence rate of gradient descent. Preprint, submitted December 5, https://arxiv.org/abs/2412.04427.Google Scholar
[21] Kim D, Fessler JA (2016) Optimized first-order methods for smooth convex minimization. Math. Programming 159:81–107.Crossref, Google Scholar
[22] Kim J, Ozdaglar A, Park C, Ryu EK (2023) Time-reversed dissipation induces duality between minimizing gradient norm and function value. Preprint, submitted May 11, https://arxiv.org/abs/2305.06628.Google Scholar
[23] Lan G (2020) First-Order and Stochastic Optimization Methods for Machine Learning (Springer, Berlin).Crossref, Google Scholar
[24] Lieder F (2018) Projection based methods for conic linear programming. PhD thesis, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany.Google Scholar
[25] Mai VV, Johansson M (2020) Anderson acceleration of proximal gradient methods. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn. ICML’20 (JMLR.org), 6620–6629.Google Scholar
[26] Moreau T, Massias M, Gramfort A, Ablin P, Bannier P-A, Charlier B, Dagréou M, et al. (2024) Benchopt: Reproducible, efficient and collaborative optimization benchmarks. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Proc. 36th Internat. Conf. Neural Inform. Processing Systems NIPS ‘22 (Curran Associates Inc., Red Hook, NY), 25404–25421.Google Scholar
[27] MOSEK ApS (2024) MOSEK optimizer API for Julia manual, version 10.0. MOSEK ApS, Copenhagen.Google Scholar
[28] Nesterov Y (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Proc. USSR Acad. Sci. 269(3):543–547.Google Scholar
[29] Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
[30] Shamir O, Zhang T (2013) Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 28 (PMLR, New York), 71–79.Google Scholar
[31] Taylor AB, Hendrickx JM, Glineur F (2017) Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3):1283–1313.Crossref, Google Scholar
[32] Taylor AB, Hendrickx JM, Glineur F (2017) Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Programming 161:307–345.Crossref, Google Scholar
[33] Teboulle M, Vaisbourd Y (2023) An elementary approach to tight worst case complexity analysis of gradient based methods. Math. Programming 201:63–96.Crossref, Google Scholar
[34] Wang B, Ma S, Yang J, Zhou D (2024) Relaxed proximal point algorithm: Tight complexity bounds and acceleration without momentum. Preprint, submitted October 11, https://arxiv.org/abs/2410.08890.Google Scholar
[35] Wolfram Research, Inc. (2024) Mathematica, version 14.0. Wolfram Research, Inc., Champaign, IL.Google Scholar
[36] Zamani M, Glineur F (2023) Exact convergence rate of the last iterate in subgradient methods. Preprint, submitted July 20, https://arxiv.org/abs/2307.11134.Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:March 18, 2024
Accepted:May 06, 2025
Published Online:July 17, 2025

Cite as

Alan Luner, Benjamin Grimmer (2025) On Averaging and Extrapolation for Gradient Descent. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0449

Keywords

Acknowledgments

The authors thank Adrien Taylor for providing kind and direct feedback on this work’s positioning among the growing PEP literature.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On Averaging and Extrapolation for Gradient Descent

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News