On Averaging and Extrapolation for Gradient Descent

Published Online:https://doi.org/10.1287/moor.2024.0449

References

  • [1] Altschuler J (2018) Greed, hedging, and acceleration in convex optimization. PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
  • [2] Altschuler JM, Parrilo PA (2024) Acceleration by stepsize hedging: Silver stepsize schedule for smooth convex optimization. Math. Programming, ePub ahead of print November 25, https://doi.org/10.1007/s10107-024-02164-2.CrossrefGoogle Scholar
  • [3] Anderson DG (1965) Iterative procedures for nonlinear integral equations. J. ACM 12(4):547–560.CrossrefGoogle Scholar
  • [4] Barré M, Taylor A, Bach F (2023) Principled analyses and design of first-order methods with inexact proximal operators. Math. Programming 201:185–230.CrossrefGoogle Scholar
  • [5] Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):1–27.CrossrefGoogle Scholar
  • [6] Das Gupta S, Van Parys BPG, Ryu EK (2024) Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization method. Math. Programming 204:567–639.CrossrefGoogle Scholar
  • [7] Drori Y (2014) Contributions to the complexity analysis of optimization algorithms. PhD thesis, Tel-Aviv University, Tel Aviv, Israel.Google Scholar
  • [8] Drori Y (2017) The exact information-based complexity of smooth convex minimization. J. Complexity 39:1–16.CrossrefGoogle Scholar
  • [9] Drori Y, Taylor AB (2020) Efficient first-order methods for convex minimization: A constructive approach. Math. Programming 184:183–220.CrossrefGoogle Scholar
  • [10] Drori Y, Teboulle M (2014) Performance of first-order methods for smooth convex minimization: A novel approach. Math. Programming 145:451–482.CrossrefGoogle Scholar
  • [11] Dunning I, Huchette J, Lubin M (2017) JuMP: A modeling language for mathematical optimization. SIAM Rev. 59(2):295–320.CrossrefGoogle Scholar
  • [12] Ghadimi E, Feyzmahdavian HR, Johansson M (2015) Global convergence of the heavy-ball method for convex optimization. 2015 Eur. Control Conf. (ECC) (IEEE, New York), 310–315.Google Scholar
  • [13] Goujaud B, Moucer C, Glineur F, Hendrickx J, Taylor A, Dieuleveut A (2024) PEPit: Computer-assisted worst-case analyses of first-order optimization methods in Python. Math. Programming 16:337–367.CrossrefGoogle Scholar
  • [14] Grimmer B (2024) Provably faster gradient descent via long steps. SIAM J. Optim. 34(3):2588–2608.CrossrefGoogle Scholar
  • [15] Grimmer B, Li D (2024) Some primal-dual theory for subgradient methods for strongly convex optimization. Preprint, submitted June 27, https://arxiv.org/abs/2305.17323.Google Scholar
  • [16] Grimmer B, Shu K, Wang AL (2023) Accelerated gradient descent via long steps. Preprint, submitted September 18, https://arxiv.org/abs/2309.09961.Google Scholar
  • [17] Gu G, Yang J (2020) Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problems. SIAM J. Optim. 30(3):1905–1921.CrossrefGoogle Scholar
  • [18] Gustavsson E, Patriksson M, Strömberg AB (2015) Primal convergence from dual subgradient methods for convex optimization. Math. Programming 150:365–390.CrossrefGoogle Scholar
  • [19] Kim D (2021) Accelerated proximal point method for maximally monotone operators. Math. Programming 190:57–87.CrossrefGoogle Scholar
  • [20] Kim J (2024) A proof of the exact convergence rate of gradient descent. Preprint, submitted December 5, https://arxiv.org/abs/2412.04427.Google Scholar
  • [21] Kim D, Fessler JA (2016) Optimized first-order methods for smooth convex minimization. Math. Programming 159:81–107.CrossrefGoogle Scholar
  • [22] Kim J, Ozdaglar A, Park C, Ryu EK (2023) Time-reversed dissipation induces duality between minimizing gradient norm and function value. Preprint, submitted May 11, https://arxiv.org/abs/2305.06628.Google Scholar
  • [23] Lan G (2020) First-Order and Stochastic Optimization Methods for Machine Learning (Springer, Berlin).CrossrefGoogle Scholar
  • [24] Lieder F (2018) Projection based methods for conic linear programming. PhD thesis, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany.Google Scholar
  • [25] Mai VV, Johansson M (2020) Anderson acceleration of proximal gradient methods. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn. ICML’20 (JMLR.org), 6620–6629.Google Scholar
  • [26] Moreau T, Massias M, Gramfort A, Ablin P, Bannier P-A, Charlier B, Dagréou M, et al. (2024) Benchopt: Reproducible, efficient and collaborative optimization benchmarks. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Proc. 36th Internat. Conf. Neural Inform. Processing Systems NIPS ‘22 (Curran Associates Inc., Red Hook, NY), 25404–25421.Google Scholar
  • [27] MOSEK ApS (2024) MOSEK optimizer API for Julia manual, version 10.0. MOSEK ApS, Copenhagen.Google Scholar
  • [28] Nesterov Y (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Proc. USSR Acad. Sci. 269(3):543–547.Google Scholar
  • [29] Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.CrossrefGoogle Scholar
  • [30] Shamir O, Zhang T (2013) Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 28 (PMLR, New York), 71–79.Google Scholar
  • [31] Taylor AB, Hendrickx JM, Glineur F (2017) Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3):1283–1313.CrossrefGoogle Scholar
  • [32] Taylor AB, Hendrickx JM, Glineur F (2017) Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Programming 161:307–345.CrossrefGoogle Scholar
  • [33] Teboulle M, Vaisbourd Y (2023) An elementary approach to tight worst case complexity analysis of gradient based methods. Math. Programming 201:63–96.CrossrefGoogle Scholar
  • [34] Wang B, Ma S, Yang J, Zhou D (2024) Relaxed proximal point algorithm: Tight complexity bounds and acceleration without momentum. Preprint, submitted October 11, https://arxiv.org/abs/2410.08890.Google Scholar
  • [35] Wolfram Research, Inc. (2024) Mathematica, version 14.0. Wolfram Research, Inc., Champaign, IL.Google Scholar
  • [36] Zamani M, Glineur F (2023) Exact convergence rate of the last iterate in subgradient methods. Preprint, submitted July 20, https://arxiv.org/abs/2307.11134.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.