On Averaging and Extrapolation for Gradient Descent
References
- [1] (2018) Greed, hedging, and acceleration in convex optimization. PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
- [2] (2024) Acceleration by stepsize hedging: Silver stepsize schedule for smooth convex optimization. Math. Programming, ePub ahead of print November 25, https://doi.org/10.1007/s10107-024-02164-2.Crossref, Google Scholar
- [3] (1965) Iterative procedures for nonlinear integral equations. J. ACM 12(4):547–560.Crossref, Google Scholar
- [4] (2023) Principled analyses and design of first-order methods with inexact proximal operators. Math. Programming 201:185–230.Crossref, Google Scholar
- [5] (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):1–27.Crossref, Google Scholar
- [6] (2024) Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization method. Math. Programming 204:567–639.Crossref, Google Scholar
- [7] (2014) Contributions to the complexity analysis of optimization algorithms. PhD thesis, Tel-Aviv University, Tel Aviv, Israel.Google Scholar
- [8] (2017) The exact information-based complexity of smooth convex minimization. J. Complexity 39:1–16.Crossref, Google Scholar
- [9] (2020) Efficient first-order methods for convex minimization: A constructive approach. Math. Programming 184:183–220.Crossref, Google Scholar
- [10] (2014) Performance of first-order methods for smooth convex minimization: A novel approach. Math. Programming 145:451–482.Crossref, Google Scholar
- [11] (2017) JuMP: A modeling language for mathematical optimization. SIAM Rev. 59(2):295–320.Crossref, Google Scholar
- [12] (2015) Global convergence of the heavy-ball method for convex optimization. 2015 Eur. Control Conf. (ECC) (IEEE, New York), 310–315.Google Scholar
- [13] (2024) PEPit: Computer-assisted worst-case analyses of first-order optimization methods in Python. Math. Programming 16:337–367.Crossref, Google Scholar
- [14] (2024) Provably faster gradient descent via long steps. SIAM J. Optim. 34(3):2588–2608.Crossref, Google Scholar
- [15] (2024) Some primal-dual theory for subgradient methods for strongly convex optimization. Preprint, submitted June 27, https://arxiv.org/abs/2305.17323.Google Scholar
- [16] (2023) Accelerated gradient descent via long steps. Preprint, submitted September 18, https://arxiv.org/abs/2309.09961.Google Scholar
- [17] (2020) Tight sublinear convergence rate of the proximal point algorithm for maximal monotone inclusion problems. SIAM J. Optim. 30(3):1905–1921.Crossref, Google Scholar
- [18] (2015) Primal convergence from dual subgradient methods for convex optimization. Math. Programming 150:365–390.Crossref, Google Scholar
- [19] (2021) Accelerated proximal point method for maximally monotone operators. Math. Programming 190:57–87.Crossref, Google Scholar
- [20] (2024) A proof of the exact convergence rate of gradient descent. Preprint, submitted December 5, https://arxiv.org/abs/2412.04427.Google Scholar
- [21] (2016) Optimized first-order methods for smooth convex minimization. Math. Programming 159:81–107.Crossref, Google Scholar
- [22] (2023) Time-reversed dissipation induces duality between minimizing gradient norm and function value. Preprint, submitted May 11, https://arxiv.org/abs/2305.06628.Google Scholar
- [23] (2020) First-Order and Stochastic Optimization Methods for Machine Learning (Springer, Berlin).Crossref, Google Scholar
- [24] (2018) Projection based methods for conic linear programming. PhD thesis, Heinrich-Heine-Universität Düsseldorf, Düsseldorf, Germany.Google Scholar
- [25] (2020) Anderson acceleration of proximal gradient methods. Daumé H III, Singh A, eds. Proc. 37th Internat. Conf. Machine Learn. ICML’20 (JMLR.org), 6620–6629.Google Scholar
- [26] (2024) Benchopt: Reproducible, efficient and collaborative optimization benchmarks. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Proc. 36th Internat. Conf. Neural Inform. Processing Systems NIPS ‘22 (Curran Associates Inc., Red Hook, NY), 25404–25421.Google Scholar
- [27] MOSEK ApS (2024) MOSEK optimizer API for Julia manual, version 10.0. MOSEK ApS, Copenhagen.Google Scholar
- [28] (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Proc. USSR Acad. Sci. 269(3):543–547.Google Scholar
- [29] (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
- [30] (2013) Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 28 (PMLR, New York), 71–79.Google Scholar
- [31] (2017) Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3):1283–1313.Crossref, Google Scholar
- [32] (2017) Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Programming 161:307–345.Crossref, Google Scholar
- [33] (2023) An elementary approach to tight worst case complexity analysis of gradient based methods. Math. Programming 201:63–96.Crossref, Google Scholar
- [34] (2024) Relaxed proximal point algorithm: Tight complexity bounds and acceleration without momentum. Preprint, submitted October 11, https://arxiv.org/abs/2410.08890.Google Scholar
- [35] Wolfram Research, Inc. (2024) Mathematica, version 14.0. Wolfram Research, Inc., Champaign, IL.Google Scholar
- [36] (2023) Exact convergence rate of the last iterate in subgradient methods. Preprint, submitted July 20, https://arxiv.org/abs/2307.11134.Google Scholar

