Variance-Reduced Accelerated First-Order Methods: Central Limit Theorems and Confidence Statements
Published Online:4 Jun 2024https://doi.org/10.1287/moor.2021.0068
References
- [1] (2017) First-Order Methods in Optimization (SIAM, Philadelphia).Crossref, Google Scholar
- [2] (1989) Parallel and Distributed Computation: Numerical Methods, International ed. (Prentice-Hall International, Hoboken, NJ).Google Scholar
- [3] (2013) Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. 41(4):1922–1943.Crossref, Google Scholar
- [4] (1997) Introduction to Stochastic Programming, Springer Series in Operations Research (Springer, Berlin, Heidelberg).Google Scholar
- [5] (1954) Multidimensional stochastic approximation methods. Ann. Math. Statist. 25(4):737–744.Crossref, Google Scholar
- [6] (2016) Singular inverse Wishart distribution and its application to portfolio theory. J. Multivariate Anal. 143(1):314–326.Crossref, Google Scholar
- [7] (2009) Stochastic Approximation: A Dynamical Systems Viewpoint, vol. 48 (Springer, Cham, Switzerland).Google Scholar
- [8] (1998) On-line learning and stochastic approximations. Saad D, ed. On-Line Learning in Neural Networks (Cambridge University Press, Cambridge, UK), 9–42.Google Scholar
- [9] (2012) Sample size selection in optimization methods for machine learning. Math. Programming 134(1):127–155.Crossref, Google Scholar
- [10] (2018) Convergence diagnostics for stochastic gradient descent with constant learning rate. 21st Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1476–1485.Google Scholar
- [11] (2006) Stochastic Approximation and Its Applications, vol. 64 (Springer Science & Business Media, New York).Google Scholar
- [12] (2020) Statistical inference for model parameters in stochastic gradient descent. Ann. Statist. 48(1):251–273.Crossref, Google Scholar
- [13] (2008) Asymptotic Theory of Statistics and Probability (Springer Science & Business Media, New York).Google Scholar
- [14] (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Proc. 27th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1646–1654.Google Scholar
- [15] (2012) Probability and Statistics (Pearson Education, London).Google Scholar
- [16] (2009) Variable-number sample-path optimization. Math. Programming 117(1–2):81–109.Crossref, Google Scholar
- [17] (2020) Bridging the gap between constant step size stochastic gradient descent and Markov chains. Ann. Statist. 48(3):1348–1382.Crossref, Google Scholar
- [18] (1968) On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39(4):1327–1332.Crossref, Google Scholar
- [19] (2012) Hybrid deterministic-stochastic methods for data fitting. SIAM J. Sci. Comput. 34(3):A1380–A1405.Crossref, Google Scholar
- [20] (2018) Stochastic heavy ball. Electronic J. Statist. 12(1):461–529.Crossref, Google Scholar
- [21] (2016) Accelerated gradient methods for nonconvex nonlinear and stochastic programming. Math. Programming 156(1):59–99.Crossref, Google Scholar
- [22] (2015) Global convergence of the heavy-ball method for convex optimization. 2015 Eur. Control Conf. (ECC) (IEEE, Piscataway, NJ), 310–315.Google Scholar
- [23] (2023) Provable non-accelerations of the heavy-ball method. Preprint, submitted July 21, https://arxiv.org/abs/2307.11291.Google Scholar
- [24] (2014) Martingale Limit Theory and Its Application (Academic Press, Cambridge, MA).Google Scholar
- [25] (1973) Stochastic Approximation and Recursive Estimation, vol. 47 (American Mathematical Society, Providence, RI).Google Scholar
- [26] (2003) Variable-sample methods for stochastic optimization. ACM Trans. Model. Comput. Simulation 13(2):108–133.Crossref, Google Scholar
- [27] (2002) Recent advances in simulation optimization: Confidence regions for stochastic approximation algorithms. Proc. 34th Conf. Winter Simulation Exploring New Frontiers (Winter Simulation Conference, San Diego), 370–376.Google Scholar
- [28] (2022) Smoothed variable sample-size accelerated proximal methods for nonsmooth stochastic convex programs. Stochastic Systems 12(4):373–410.Link, Google Scholar
- [29] (2019) On variance reduction for stochastic smooth convex optimization with multiplicative noise. Math. Programming 174(1–2):253–292.Crossref, Google Scholar
- [30] (2013) Accelerating stochastic gradient descent using predictive variance reduction. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Proc. 26th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 315–323.Google Scholar
- [31] (2012) Imagenet classification with deep convolutional neural networks. Pereira F, Burges CJC, Bottou L, Weinberger KQ, eds. Proc. 25th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1097–1105.Google Scholar
- [32] (2003) Stochastic Approximation and Recursive Algorithms and Applications, vol. 35 (Springer Science & Business Media, New York).Google Scholar
- [33] (2017) Individual confidence intervals for solutions to expected value formulations of stochastic variational inequalities. Math. Programming 165(1):151–196.Crossref, Google Scholar
- [34] (2022) Asynchronous variance-reduced block schemes for composite non-convex stochastic optimization: Block-specific steplengths and adapted batch-sizes. Optim. Methods Software 37(1):264–294.Crossref, Google Scholar
- [35] (2022) Distributed variable sample-size gradient-response and best-response schemes for stochastic Nash equilibrium problems. SIAM J. Optim. 32(2):573–603.Crossref, Google Scholar
- [36] (1972) Some characterizations of the multivariate t distribution. J. Multivariate Anal. 2(3):339–344.Crossref, Google Scholar
- [37] (2014) A new method to build confidence regions for solutions of stochastic variational inequalities. Optimization 63(9):1431–1443.Crossref, Google Scholar
- [38] (2017) Stochastic gradient descent as approximate Bayesian inference. J. Machine Learn. Res. 18(1):4873–4907.Google Scholar
- [39] (2012) Markov Chains and Stochastic Stability (Springer Science & Business Media, New York).Google Scholar
- [40] (1983) Problem Complexity and Method Efficiency in Optimization (Wiley, New York).Google Scholar
- [41] (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- [42] (1983) A method for solving the convex programming problem with convergence rate O(1/k2). Doklady Akademii nauk SSSR 269:543–547.Google Scholar
- [43] (2013) Introductory Lectures on Convex Optimization: A Basic Course, vol. 87 (Springer Science & Business Media, New York).Google Scholar
- [44] (1999) Numerical Optimization, Springer Series in Operations Research (Springer-Verlag, New York).Crossref, Google Scholar
- [45] (2010) On choosing parameters in retrospective-approximation algorithms for stochastic root finding and simulation optimization. Oper. Res. 58(4 Part 1):889–901.Link, Google Scholar
- [46] (2011) The stochastic root-finding problem: Overview, solutions, and open questions. ACM Trans. Model. Comput. Simulation 21(3):19.Crossref, Google Scholar
- [47] (2018) On sampling rates in simulation-based recursions. SIAM J. Optim. 28(1):45–73.Crossref, Google Scholar
- [48] (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
- [49] (1987) Introduction to Optimization (Optimization Software Publications Division, New York).Google Scholar
- [50] (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
- [51] (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
- [52] (1971) A convergence theorem for non negative almost supermartingales and some applications. Rustagi JS, ed. Optimizing Methods in Statistics (Elsevier, Amsterdam), 233–257.Google Scholar
- [53] Convergence rates of inexact proximal-gradient methods for convex optimization. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Proc. 24th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1458–1466.Google Scholar
- [54] (2015) Budget-constrained stochastic approximation. Proc. 2015 Winter Simulation Conf. (IEEE Press, Piscataway, NJ), 368–379.Google Scholar
- [55] (2014) Lectures on Stochastic Programming: Modeling and Theory, 2nd ed. (SIAM, Philadelphia).Crossref, Google Scholar
- [56] (2000) Determinants of block matrices. Math. Gazette 84(501):460–467.Crossref, Google Scholar
- [57] (2000) Adaptive stochastic approximation by the simultaneous perturbation method. IEEE Trans. Automatic Control 45(10):1839–1853.Crossref, Google Scholar
- [58] (2018) Uncertainty quantification for online learning and stochastic approximation via hierarchical incremental gradient descent. Preprint, submitted February 13, https://arxiv.org/abs/1802.04876.Google Scholar
- [59] (2013) On the importance of initialization and momentum in deep learning. Dasgupta S, ed. 30th Internat. Conf. Machine Learn. (ICML, San Diego), 1139–1147.Google Scholar
- [60] (2015) Going deeper with convolutions. Proc. 2015 IEEE Conf. Comput. Vision Pattern Recognition (CVPR) (IEEE Press, Piscataway, NJ), 1–9.Google Scholar
- [61] (2014) Statistical analysis of stochastic gradient methods for generalized linear models. Xing EP, ed. 31th Internat. Conf. Machine Learn. (ICML, San Diego), 667–675.Google Scholar
- [62] (2000) Asymptotic Statistics, vol. 3 (Cambridge University Press, Cambridge, UK).Google Scholar
- [63] (1967) An extension of the Robbins-Monro procedure. Ann. Math. Statist. 38(1):181–190.Crossref, Google Scholar
- [64] (1987) Multivariate adaptive stochastic approximation. Ann. Statist. 15(3):1115–1130.Crossref, Google Scholar
- [65] (2016) Unified convergence analysis of stochastic momentum methods for convex and non-convex optimization. Preprint, submitted April 12, https://arxiv.org/abs/1604.03257.Google Scholar
- [66] (2012) On stochastic gradient and subgradient methods with adaptive steplength sequences. Automatica J. IFAC 48(1):56–67.Crossref, Google Scholar
- [67] (2021) On constructing confidence region for model parameters in stochastic gradient descent via batch means. Proc. 2021 Winter Simulation Conf. (IEEE Press, Piscataway, NJ).Google Scholar

