Variance-Reduced Accelerated First-Order Methods: Central Limit Theorems and Confidence Statements

Published Online:https://doi.org/10.1287/moor.2021.0068

References

  • [1] Beck A (2017) First-Order Methods in Optimization (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [2] Bertsekas D, Tsitsiklis J (1989) Parallel and Distributed Computation: Numerical Methods, International ed. (Prentice-Hall International, Hoboken, NJ).Google Scholar
  • [3] Bickel P, Choi D, Chang X, Zhang H (2013) Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. 41(4):1922–1943.CrossrefGoogle Scholar
  • [4] Birge JR, Louveaux F (1997) Introduction to Stochastic Programming, Springer Series in Operations Research (Springer, Berlin, Heidelberg).Google Scholar
  • [5] Blum JR (1954) Multidimensional stochastic approximation methods. Ann. Math. Statist. 25(4):737–744.CrossrefGoogle Scholar
  • [6] Bodnar T, Mazur S, Podgórski K (2016) Singular inverse Wishart distribution and its application to portfolio theory. J. Multivariate Anal. 143(1):314–326.CrossrefGoogle Scholar
  • [7] Borkar VS (2009) Stochastic Approximation: A Dynamical Systems Viewpoint, vol. 48 (Springer, Cham, Switzerland).Google Scholar
  • [8] Bottou L (1998) On-line learning and stochastic approximations. Saad D, ed. On-Line Learning in Neural Networks (Cambridge University Press, Cambridge, UK), 9–42.Google Scholar
  • [9] Byrd RH, Chin GM, Nocedal J, Wu Y (2012) Sample size selection in optimization methods for machine learning. Math. Programming 134(1):127–155.CrossrefGoogle Scholar
  • [10] Chee J, Toulis P (2018) Convergence diagnostics for stochastic gradient descent with constant learning rate. 21st Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1476–1485.Google Scholar
  • [11] Chen HF (2006) Stochastic Approximation and Its Applications, vol. 64 (Springer Science & Business Media, New York).Google Scholar
  • [12] Chen X, Lee JD, Tong XT, Zhang Y (2020) Statistical inference for model parameters in stochastic gradient descent. Ann. Statist. 48(1):251–273.CrossrefGoogle Scholar
  • [13] DasGupta A (2008) Asymptotic Theory of Statistics and Probability (Springer Science & Business Media, New York).Google Scholar
  • [14] Defazio A, Bach F, Lacoste-Julien S (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Proc. 27th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1646–1654.Google Scholar
  • [15] DeGroot MH, Schervish MJ (2012) Probability and Statistics (Pearson Education, London).Google Scholar
  • [16] Deng G, Ferris MC (2009) Variable-number sample-path optimization. Math. Programming 117(1–2):81–109.CrossrefGoogle Scholar
  • [17] Dieuleveut A, Durmus A, Bach F (2020) Bridging the gap between constant step size stochastic gradient descent and Markov chains. Ann. Statist. 48(3):1348–1382.CrossrefGoogle Scholar
  • [18] Fabian V (1968) On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39(4):1327–1332.CrossrefGoogle Scholar
  • [19] Friedlander MP, Schmidt M (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3):A1380–A1405.CrossrefGoogle Scholar
  • [20] Gadat S, Panloup F, Saadane S (2018) Stochastic heavy ball. Electronic J. Statist. 12(1):461–529.CrossrefGoogle Scholar
  • [21] Ghadimi S, Lan G (2016) Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Programming 156(1):59–99.CrossrefGoogle Scholar
  • [22] Ghadimi E, Feyzmahdavian HR, Johansson M (2015) Global convergence of the heavy-ball method for convex optimization. 2015 Eur. Control Conf. (ECC) (IEEE, Piscataway, NJ), 310–315.Google Scholar
  • [23] Goujaud B, Taylor A, Dieuleveut A (2023) Provable non-accelerations of the heavy-ball method. Preprint, submitted July 21, https://arxiv.org/abs/2307.11291.Google Scholar
  • [24] Hall P, Heyde CC (2014) Martingale Limit Theory and Its Application (Academic Press, Cambridge, MA).Google Scholar
  • [25] Hasminskii RZ, Silver B (1973) Stochastic Approximation and Recursive Estimation, vol. 47 (American Mathematical Society, Providence, RI).Google Scholar
  • [26] Homem-de-Mello T (2003) Variable-sample methods for stochastic optimization. ACM Trans. Model. Comput. Simulation 13(2):108–133.CrossrefGoogle Scholar
  • [27] Hsieh MH, Glynn PW (2002) Recent advances in simulation optimization: Confidence regions for stochastic approximation algorithms. Proc. 34th Conf. Winter Simulation Exploring New Frontiers (Winter Simulation Conference, San Diego), 370–376.Google Scholar
  • [28] Jalilzadeh A, Shanbhag U, Blanchet J, Glynn PW (2022) Smoothed variable sample-size accelerated proximal methods for nonsmooth stochastic convex programs. Stochastic Systems 12(4):373–410.LinkGoogle Scholar
  • [29] Jofré A, Thompson P (2019) On variance reduction for stochastic smooth convex optimization with multiplicative noise. Math. Programming 174(1–2):253–292.CrossrefGoogle Scholar
  • [30] Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Proc. 26th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 315–323.Google Scholar
  • [31] Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proc. 25th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1097–1105.Google Scholar
  • [32] Kushner H, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications, vol. 35 (Springer Science & Business Media, New York).Google Scholar
  • [33] Lamm M, Lu S, Budhiraja A (2017) Individual confidence intervals for solutions to expected value formulations of stochastic variational inequalities. Math. Programming 165(1):151–196.CrossrefGoogle Scholar
  • [34] Lei J, Shanbhag UV (2022) Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: Block-specific steplengths and adapted batch-sizes. Optim. Methods Software 37(1):264–294.CrossrefGoogle Scholar
  • [35] Lei J, Shanbhag UV (2022) Distributed variable sample-size gradient-response and best-response schemes for stochastic Nash equilibrium problems. SIAM J. Optim. 32(2):573–603.CrossrefGoogle Scholar
  • [36] Lin PE (1972) Some characterizations of the multivariate t distribution. J. Multivariate Anal. 2(3):339–344.CrossrefGoogle Scholar
  • [37] Lu S (2014) A new method to build confidence regions for solutions of stochastic variational inequalities. Optimization 63(9):1431–1443.CrossrefGoogle Scholar
  • [38] Mandt S, Hoffman MD, Blei DM (2017) Stochastic gradient descent as approximate Bayesian inference. J. Machine Learn. Res. 18(1):4873–4907.Google Scholar
  • [39] Meyn SP, Tweedie RL (2012) Markov Chains and Stochastic Stability (Springer Science & Business Media, New York).Google Scholar
  • [40] Nemirovskiĭ A, Yudin D (1983) Problem Complexity and Method Efficiency in Optimization (Wiley, New York).Google Scholar
  • [41] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • [42] Nesterov YE (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Doklady Akademii nauk SSSR 269:543–547.Google Scholar
  • [43] Nesterov Y (2013) Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer Science & Business Media, New York).Google Scholar
  • [44] Nocedal J, Wright SJ (1999) Numerical Optimization, Springer Series in Operations Research (Springer-Verlag, New York).CrossrefGoogle Scholar
  • [45] Pasupathy R (2010) On choosing parameters in retrospective-approximation algorithms for stochastic root finding and simulation optimization. Oper. Res. 58(4 Part 1):889–901.LinkGoogle Scholar
  • [46] Pasupathy R, Kim S (2011) The stochastic root-finding problem: Overview, solutions, and open questions. ACM Trans. Model. Comput. Simulation 21(3):19.CrossrefGoogle Scholar
  • [47] Pasupathy R, Glynn PW, Ghosh S, Hashemi FS (2018) On sampling rates in simulation-based recursions. SIAM J. Optim. 28(1):45–73.CrossrefGoogle Scholar
  • [48] Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.CrossrefGoogle Scholar
  • [49] Polyak BT (1987) Introduction to Optimization (Optimization Software Publications Division, New York).Google Scholar
  • [50] Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.CrossrefGoogle Scholar
  • [51] Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.CrossrefGoogle Scholar
  • [52] Robbins H, Siegmund D (1971) A convergence theorem for non negative almost supermartingales and some applications. Rustagi JS, ed. Optimizing Methods in Statistics (Elsevier, Amsterdam), 233–257.Google Scholar
  • [53] Schmidt M, Roux NL, Bach FR Convergence rates of inexact proximal-gradient methods for convex optimization. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Proc. 24th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1458–1466.Google Scholar
  • [54] Shanbhag UV, Blanchet JH (2015) Budget-constrained stochastic approximation. Proc. 2015 Winter Simulation Conf. (IEEE Press, Piscataway, NJ), 368–379.Google Scholar
  • [55] Shapiro A, Dentcheva D, Ruszczyński A (2014) Lectures on Stochastic Programming: Modeling and Theory, 2nd ed. (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [56] Silvester JR (2000) Determinants of block matrices. Math. Gazette 84(501):460–467.CrossrefGoogle Scholar
  • [57] Spall JC (2000) Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Automatic Control 45(10):1839–1853.CrossrefGoogle Scholar
  • [58] Su WJ, Zhu Y (2018) Uncertainty quantification for online learning and stochastic approximation via hierarchical incremental gradient descent. Preprint, submitted February 13, https://arxiv.org/abs/1802.04876.Google Scholar
  • [59] Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. Dasgupta S, ed. 30th Internat. Conf. Machine Learn. (ICML, San Diego), 1139–1147.Google Scholar
  • [60] Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. Proc. 2015 IEEE Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE Press, Piscataway, NJ), 1–9.Google Scholar
  • [61] Toulis P, Airoldi E, Rennie J (2014) Statistical analysis of stochastic gradient methods for generalized linear models. Xing EP, ed. 31th Internat. Conf. Machine Learn. (ICML, San Diego), 667–675.Google Scholar
  • [62] Van der Vaart AW (2000) Asymptotic Statistics, vol. 3 (Cambridge University Press, Cambridge, UK).Google Scholar
  • [63] Venter J (1967) An extension of the Robbins-Monro procedure. Ann. Math. Statist. 38(1):181–190.CrossrefGoogle Scholar
  • [64] Wei C (1987) Multivariate adaptive stochastic approximation. Ann. Statist. 15(3):1115–1130.CrossrefGoogle Scholar
  • [65] Yang T, Lin Q, Li Z (2016) Unified convergence analysis of stochastic momentum methods for convex and non-convex optimization. Preprint, submitted April 12, https://arxiv.org/abs/1604.03257.Google Scholar
  • [66] Yousefian F, Nedic A, Shanbhag UV (2012) On stochastic gradient and subgradient methods with adaptive steplength sequences. Automatica J. IFAC 48(1):56–67.CrossrefGoogle Scholar
  • [67] Zhu Y, Dong J (2021) On constructing confidence region for model parameters in stochastic gradient descent via batch means. Proc. 2021 Winter Simulation Conf. (IEEE Press, Piscataway, NJ).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.