Composing Optimized Stepsize Schedules for Gradient Descent
Published Online:4 Nov 2025
References
- [1] (2025) Acceleration by stepsize hedging: Silver stepsize schedule for smooth convex optimization. Math. Programming 213:1105–1118.Crossref, Google Scholar
- [2] (2025) Acceleration by stepsize hedging: Multi-step descent and the silver stepsize schedule. J. ACM 72(2):1–38.Crossref, Google Scholar
- [3] (2024) Branch-and-bound performance estimation programming: A unified methodology for constructing optimal optimization methods. Math. Program. 204(1–2):567–639. Crossref, Google Scholar
- [4] (2014) Performance of first-order methods for smooth convex minimization: A novel approach. Math. Programming 145:451–482.Crossref, Google Scholar
- [5] (2023) Accelerated gradient descent via long steps. Preprint, submitted September 27, https://arxiv.org/abs/2309.09961.Google Scholar
- [6] (2024) A strengthened conjecture on the minimax optimal constant stepsize for gradient descent. Preprint, submitted July 16, https://arxiv.org/abs/2407.11739.Google Scholar
- [7] (2025) Accelerated objective gap and gradient norm convergence for gradient descent via long steps. INFORMS J. Optim. 7(2):156–169.Link, Google Scholar
- [8] (2025) A proof of exact convergence rate of gradient descent. Part II. Performance criterion (f(xn)−f*)/‖x0−x*‖2. Preprint, submitted March 26, https://arxiv.org/abs/2412.04427.Google Scholar
- [9] (2016) Optimized first-order methods for smooth convex minimization. Math. Programming 159(1):81–107.Crossref, Google Scholar
- [10] (2021) Optimizing the efficiency of first-order methods for decreasing the gradient of smooth convex functions. J. Optim. Theory Appl. 188(1):192–219.Crossref, Google Scholar
- [11] (2023) Time-reversed dissipation induces duality between minimizing gradient norm and function value. Adv. Neural Inform. Processing Systems 36:23389–23440.Google Scholar
- [12] (2024) On averaging and extrapolation for gradient descent. Preprint, submitted February 19, https://arxiv.org/abs/2402.12493.Google Scholar
- [13] (2020) Adaptive gradient descent without descent. Proc. 37th Internat. Conf. Machine Learn. ICML 2020, vol. 119 (PMLR, New York), 6702–6712.Google Scholar
- [14] (1983) A method for solving the convex programming problem with convergence rate o(1/k2). Proc. USSR Acad. Sci. 269:543–547.Google Scholar
- [15] (2014) Introductory Lectures on Convex Optimization: A Basic Course, 1st ed. (Springer Publishing Company, New York).Google Scholar
- [16] (2024) Exact worst-case convergence rates of gradient descent: A complete analysis for all constant stepsizes over nonconvex and convex functions. Preprint, submitted June 25, https://arxiv.org/abs/2406.17506.Google Scholar
- [17] (2017) Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3):1283–1313.Crossref, Google Scholar
- [18] (2017) Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Programming 161:307–345.Crossref, Google Scholar
- [19] (2023) An elementary approach to tight worst case complexity analysis of gradient based methods. Math. Programming 201(1–2):63–96.Crossref, Google Scholar
- [20] (2024) Relaxed proximal point algorithm: Tight complexity bounds and acceleration without momentum. Preprint, submitted October 11, https://arxiv.org/abs/2410.08890.Google Scholar
- [21] (2024) Accelerated gradient descent by concatenation of stepsize schedules. Preprint, submitted October 16, https://arxiv.org/abs/2410.12395.Google Scholar

