[1] Assran M, Rabbat M (2020) On the convergence of Nesterov’s accelerated gradient method in stochastic settings. Proc. Machine Learning Res. 119:410–420.Google Scholar
[2] Aybat NS, Fallah A, Gurbuzbalaban M, Ozdaglar A (2019) A universally optimal multistage accelerated stochastic gradient method. Adv. Neural Inform. Processing Systems 32.Google Scholar
[3] Aybat NS, Fallah A, Gürbüzbalaban M, Ozdaglar A (2020) Robust accelerated gradient methods for smooth strongly convex functions. SIAM J. Optim. 30(1):717–751.Crossref, Google Scholar
[4] Bauschke HG, Borwein JM (1997) Legendre functions and the method of random Bregman projections. J. Convex Anal. 4(1):27–67.Google Scholar
[5] Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.Crossref, Google Scholar
[6] Bubeck S (2015) Convex optimization: Algorithms and complexity. Foundations Trends Machine Learning 8(3–4):231–357.Crossref, Google Scholar
[7] Cevher V, Vũ BC (2018) On the linear convergence of the stochastic gradient method with constant step-size. Optim. Lett. 13(5):1177–1187.Crossref, Google Scholar
[8] Chen X, Lin Q, Pena J (2012) Optimal regularized dual averaging methods for stochastic optimization. Adv. Neural Inform. Processing Systems 25.Google Scholar
[9] Cohen M, Diakonikolas J, Orecchia L (2018) On acceleration with noise-corrupted gradients. Proc. Machine Learning Res. 80:1019–1028.Google Scholar
[10] Cyrus S, Hu B, Scoy BV, Lessard L (2018) A robust accelerated optimization algorithm for strongly convex functions. 2018 Annu. Amer. Control Conf. ACC (IEEE, Piscataway, NJ), 1376–1381.Google Scholar
[11] Diakonikolas J, Orecchia L (2018) Accelerated extra-gradient descent: A novel accelerated first-order method. 9th Innovations Theor. Comput. Sci. Conf. ITCS 2018 (Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarbruecken, Germany), 23:1–23:19.Google Scholar
[12] Diakonikolas J, Orecchia L (2019) The approximate duality gap technique: A unified theory of first-order methods. SIAM J. Optim. 29(1):660–689.Crossref, Google Scholar
[13] Ghadimi S, Lan G (2013) Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization. II: Shrinking procedures and optimal algorithms. SIAM J. Optim. 23(4):2061–2089.Crossref, Google Scholar
[14] Gower RM, Loizou N, Qian X, Sailanbayev A, Shulgin E, Richtárik P (2019) SGD: General analysis and improved rates. Proc. Machine Learning Res. 97:5200–5209.Google Scholar
[15] Grimmer B (2019) Convergence rates for deterministic and stochastic subgradient methods without Lipschitz continuity. SIAM J. Optim. 29(2):1350–1365.Crossref, Google Scholar
[16] Gürbüzbalaban M, Ozdaglar A, Parrilo P (2015) A globally convergent incremental Newton method. Math. Program. 151(1):283–313.Crossref, Google Scholar
[17] Hu B, Lessard L (2017) Dissipativity theory for Nesterov’s accelerated method. Proc. Machine Learning Res. 70:1549–1557.Google Scholar
[18] Hu B, Seiler P, Lessard L (2020) Analysis of biased stochastic gradient descent using sequential semidefinite programs. Math. Program. 187(1–2):383–408.Crossref, Google Scholar
[19] Hu B, Wright S, Lessard L (2018) Dissipativity theory for accelerating stochastic variance reduction: A unified analysis of SVRG and Katyusha using semidefinite programs. Proc. Machine Learning Res. 80:2038–2047.Google Scholar
[20] Jain P, Kakade SM, Kidambi R, Netrapalli P, Sidford A (2018) Accelerating stochastic gradient descent for least squares regression. Proc. Machine Learning Res. 75:545–604.Google Scholar
[21] Jain P, Kakade SM, Kidambi R, Netrapalli P, Sidford A (2018) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learning Res. 18(223):1–42.Google Scholar
[22] Jofré A, Thompson P (2018) On variance reduction for stochastic smooth convex optimization with multiplicative noise. Math. Program. 174(1–2):253–292.Crossref, Google Scholar
[23] Kidambi R, Netrapalli P, Jain P, Kakade S (2018) On the insufficiency of existing momentum schemes for stochastic optimization. Inform. Theory Appl. Workshop ITA (IEEE, Piscataway, NJ), 1–9.Google Scholar
[24] Lessard L, Recht B, Packard A (2016) Analysis and design of optimization algorithms via integral quadratic constraints. SIAM J. Optim. 26(1):57–95.Crossref, Google Scholar
[25] Liu C, Belkin M (2020) Accelerating SGD with momentum for over-parameterized learning. Internat. Conf. Learning Representations. Preprint, submitted October 31, https://arxiv.org/abs/1810.13395.Google Scholar
[26] Mourtada J (2022) Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices. Ann. Statist. 50(4):2157–2178.Crossref, Google Scholar
[27] Nemirovskij AS, Yudin DB (1983) Problem Complexity and Method Efficiency in Optimization (Wiley-Interscience, Chichester, UK).Google Scholar
[28] Nesterov YE (1983) A method for solving the convex programming problem with convergence rate o(1/k2). Dokl. Akad. Nauk SSSR 269:543–547.Google Scholar
[29] Nesterov Y (2004) Introductory Lectures on Convex Optimization, Applied Optimization, vol. 87 (Springer US, New York).Google Scholar
[30] Nguyen LM, Nguyen PH, Richtárik P, Scheinberg K, Takác M, van Dijk M (2019) New convergence aspects of stochastic gradient algorithms. J. Machine Learning Res. 20:176:1–176:49.Google Scholar
[31] Polyak B (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
[32] Rakhlin A, Shamir O, Sridharan K (2012) Making gradient descent optimal for strongly convex stochastic optimization. Proc. 29th Internat. Conf. Machine Learning, 1571–1578.Google Scholar
[33] Schmidt M, Roux NL (2013) Fast convergence of stochastic gradient descent under a strong growth condition. Preprint, submitted August 29, https://arxiv.org/abs/1308.6370.Google Scholar
[34] Scoy BV, Freeman RA, Lynch KM (2018) The fastest known globally convergent first-order method for minimizing strongly convex functions. IEEE Control Systems Lett. 2(1):49–54.Crossref, Google Scholar
[35] Solodov M (1998) Incremental gradient algorithms with stepsizes bounded away from zero. Comput. Optim. Appl. 11(1):23–35.Crossref, Google Scholar
[36] Stich SU (2019) Unified optimal analysis of the (stochastic) gradient method. Preprint, submitted July 9, https://arxiv.org/abs/1907.04232.Google Scholar
[37] Tseng P (1998) An incremental gradient(-projection) method with momentum term and adaptive stepsize rule. SIAM J. Optim. 8(2):506–531.Crossref, Google Scholar
[38] Vaswani S, Bach FR, Schmidt M (2019) Fast and faster convergence of SGD for over-parameterized models and an accelerated perceptron. Proc. Machine Learning Res. 89:1195–1204.Google Scholar
[39] Xiao L (2010) Dual averaging methods for regularized stochastic learning and online optimization. J. Machine Learning Res. 11:2543–2596.Google Scholar
[40] Xu P, Wang T, Gu Q (2018) Continuous and discrete-time accelerated stochastic mirror descent for strongly convex functions. Proc. Machine Learning Res. 80:5492–5501.Google Scholar
[41] Zhou X (2018) On the Fenchel duality between strong convexity and Lipschitz continuous gradient. Preprint, submitted March 17, https://arxiv.org/abs/1803.06573.Google Scholar

cover image Mathematics of Operations Research

Volume 49, Issue 4

November 2024

Pages 2049-2802, C2

Article Information

Metrics

Information

Received:November 02, 2021
Accepted:October 30, 2023
Published Online:December 06, 2023

Cite as

You-Lin Chen, Sen Na, Mladen Kolar (2023) Convergence Analysis of Accelerated Stochastic Gradient Descent Under the Growth Condition. Mathematics of Operations Research 49(4):2492-2526.

https://doi.org/10.1287/moor.2021.0293

Keywords

Acknowledgments

You-Lin Chen and Sen Na contributed equally to the manuscript.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Convergence Analysis of Accelerated Stochastic Gradient Descent Under the Growth Condition

References

Volume 49, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News