Convergence Analysis of Accelerated Stochastic Gradient Descent Under the Growth Condition
Published Online:6 Dec 2023https://doi.org/10.1287/moor.2021.0293
References
- [1] (2020) On the convergence of Nesterov’s accelerated gradient method in stochastic settings. Proc. Machine Learning Res. 119:410–420.Google Scholar
- [2] (2019) A universally optimal multistage accelerated stochastic gradient method. Adv. Neural Inform. Processing Systems 32.Google Scholar
- [3] (2020) Robust accelerated gradient methods for smooth strongly convex functions. SIAM J. Optim. 30(1):717–751.Crossref, Google Scholar
- [4] (1997) Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1):27–67.Google Scholar
- [5] (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.Crossref, Google Scholar
- [6] (2015) Convex optimization: Algorithms and complexity. Foundations Trends Machine Learning 8(3–4):231–357.Crossref, Google Scholar
- [7] (2018) On the linear convergence of the stochastic gradient method with constant step-size. Optim. Lett. 13(5):1177–1187.Crossref, Google Scholar
- [8] (2012) Optimal regularized dual averaging methods for stochastic optimization. Adv. Neural Inform. Processing Systems 25.Google Scholar
- [9] (2018) On acceleration with noise-corrupted gradients. Proc. Machine Learning Res. 80:1019–1028.Google Scholar
- [10] (2018) A robust accelerated optimization algorithm for strongly convex functions. 2018 Annu. Amer. Control Conf. ACC (IEEE, Piscataway, NJ), 1376–1381.Google Scholar
- [11] (2018) Accelerated extra-gradient descent: A novel accelerated first-order method. 9th Innovations Theor. Comput. Sci. Conf. ITCS 2018 (Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarbruecken, Germany), 23:1–23:19.Google Scholar
- [12] (2019) The approximate duality gap technique: A unified theory of first-order methods. SIAM J. Optim. 29(1):660–689.Crossref, Google Scholar
- [13] (2013) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization. II: Shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4):2061–2089.Crossref, Google Scholar
- [14] (2019) SGD: General analysis and improved rates. Proc. Machine Learning Res. 97:5200–5209.Google Scholar
- [15] (2019) Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity. SIAM J. Optim. 29(2):1350–1365.Crossref, Google Scholar
- [16] (2015) A globally convergent incremental Newton method. Math. Program. 151(1):283–313.Crossref, Google Scholar
- [17] (2017) Dissipativity theory for Nesterov’s accelerated method. Proc. Machine Learning Res. 70:1549–1557.Google Scholar
- [18] (2020) Analysis of biased stochastic gradient descent using sequential semidefinite programs. Math. Program. 187(1–2):383–408.Crossref, Google Scholar
- [19] (2018) Dissipativity theory for accelerating stochastic variance reduction: A unified analysis of SVRG and Katyusha using semidefinite programs. Proc. Machine Learning Res. 80:2038–2047.Google Scholar
- [20] (2018) Accelerating stochastic gradient descent for least squares regression. Proc. Machine Learning Res. 75:545–604.Google Scholar
- [21] (2018) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learning Res. 18(223):1–42.Google Scholar
- [22] (2018) On variance reduction for stochastic smooth convex optimization with multiplicative noise. Math. Program. 174(1–2):253–292.Crossref, Google Scholar
- [23] (2018) On the insufficiency of existing momentum schemes for stochastic optimization. Inform. Theory Appl. Workshop ITA (IEEE, Piscataway, NJ), 1–9.Google Scholar
- [24] (2016) Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1):57–95.Crossref, Google Scholar
- [25] (2020) Accelerating SGD with momentum for over-parameterized learning. Internat. Conf. Learning Representations. Preprint, submitted October 31, https://arxiv.org/abs/1810.13395.Google Scholar
- [26] (2022) Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices. Ann. Statist. 50(4):2157–2178.Crossref, Google Scholar
- [27] (1983) Problem Complexity and Method Efficiency in Optimization (Wiley-Interscience, Chichester, UK).Google Scholar
- [28] (1983) A method for solving the convex programming problem with convergence rate o(1/k2). Dokl. Akad. Nauk SSSR 269:543–547.Google Scholar
- [29] (2004) Introductory Lectures on Convex Optimization, Applied Optimization, vol. 87 (Springer US, New York).Google Scholar
- [30] (2019) New convergence aspects of stochastic gradient algorithms. J. Machine Learning Res. 20:176:1–176:49.Google Scholar
- [31] (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
- [32] (2012) Making gradient descent optimal for strongly convex stochastic optimization. Proc. 29th Internat. Conf. Machine Learning, 1571–1578.Google Scholar
- [33] (2013) Fast convergence of stochastic gradient descent under a strong growth condition. Preprint, submitted August 29, https://arxiv.org/abs/1308.6370.Google Scholar
- [34] (2018) The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Lett. 2(1):49–54.Crossref, Google Scholar
- [35] (1998) Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11(1):23–35.Crossref, Google Scholar
- [36] (2019) Unified optimal analysis of the (stochastic) gradient method. Preprint, submitted July 9, https://arxiv.org/abs/1907.04232.Google Scholar
- [37] (1998) An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8(2):506–531.Crossref, Google Scholar
- [38] (2019) Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. Proc. Machine Learning Res. 89:1195–1204.Google Scholar
- [39] (2010) Dual averaging methods for regularized stochastic learning and online optimization. J. Machine Learning Res. 11:2543–2596.Google Scholar
- [40] (2018) Continuous and discrete-time accelerated stochastic mirror descent for strongly convex functions. Proc. Machine Learning Res. 80:5492–5501.Google Scholar
- [41] (2018) On the Fenchel duality between strong convexity and Lipschitz continuous gradient. Preprint, submitted March 17, https://arxiv.org/abs/1803.06573.Google Scholar

