Exponential Concentration in Stochastic Approximation
References
- (2003) Applied Probability and Queues, vol. 2 (Springer, New York).Google Scholar
- (2015) Convex Optimization Algorithms (Athena Scientific, Belmont, MA).Google Scholar
- (2001) Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Probab. 11(4):1384–1428.Google Scholar
- (2018) Optimization methods for large-scale machine learning. Siam Rev. 60(2):223–311.Crossref, Google Scholar
- (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3):200–217.Crossref, Google Scholar
- (2002) Rate of convergence for constrained stochastic approximation algorithms. SIAM J. Control Optim. 40(4):1011–1041.Crossref, Google Scholar
- (1997) Parallel Optimization: Theory, Algorithms, and Applications, Numerical Mathematics and Scientific Computation (Oxford University Press, Oxford, UK).Google Scholar
- (2022) Stationary behavior of constant stepsize SGD type algorithms: An asymptotic characterization. Proc. ACM Measurement Anal. Comput. Systems 6(1):1–24.Crossref, Google Scholar
- (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.Crossref, Google Scholar
- (2019) Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1):207–239.Crossref, Google Scholar
- (2019) Stochastic algorithms with geometric step decay converge linearly on sharp functions. Math. Programming 207:145–190.Crossref, Google Scholar
- (2023) Asymptotic normality and optimality in nonsmooth stochastic approximation. Ann Statist. 52(4):1485–1508.Google Scholar
- (2009) Large Deviations Techniques and Applications (Springer, New York).Google Scholar
- (1987) A parallel subgradient projections method for the convex feasibility problem. J. Comput. Appl. Math. 18(3):307–320.Crossref, Google Scholar
- (2008) Efficient projections onto the l 1-ball for learning in high dimensions. Proc. 25th Internat. Conf. Machine Learn. (ACM, New York), 272–279.Google Scholar
- (2021) Asymptotic optimality in stochastic optimization. Ann. Statist. 49(1):21–48.Crossref, Google Scholar
- (1967) Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist. 38(1):191–200.Crossref, Google Scholar
- (1968) On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39(4):1327–1332.Crossref, Google Scholar
- (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.Crossref, Google Scholar
- (1982) Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3):502–525.Crossref, Google Scholar
- (1972) Local asymptotic minimax and admissibility in estimation. Proc. Sixth Berkeley Sympos. Math. Statist. Probab. (Berkeley), vol. 1, 175–194.Google Scholar
- (1981) On the distribution of multidimensional reflected Brownian motion. SIAM J. Appl. Math. 41(2):345–361.Crossref, Google Scholar
- (1987) Multidimensional reflected Brownian motions having exponential stationary distributions. Ann. Probab. 15(1):115–137.Google Scholar
- (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.Crossref, Google Scholar
- (2012) Projection-free online learning. 29th Internat. Conf. Machine Learn., ICML 2012 (ACM, New York), 521–528.Google Scholar
- (2016) Variance-reduced and projection-free stochastic optimization. Internat. Conf. Machine Learn. (PMLR, New York), 1263–1271.Google Scholar
- (1990) On the convergence properties of Hildreth’s quadratic programming algorithm. Math. Programming 47(1–3):37–51.Crossref, Google Scholar
- (2013) Revisiting frank-wolfe: Projection-free sparse convex optimization. Internat. Conf. Machine Learn. (PMLR, New York), 427–435.Google Scholar
- (1952) Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3):462–466.Crossref, Google Scholar
- (1964) A martingale inequality in the theory of queues. Math. Proc. Cambridge Philos. Soc. 60(2):359–361.Crossref, Google Scholar
- (2012) Stochastic Approximation Methods for Constrained and Unconstrained Systems, Applied Mathematical Sciences (Springer, New York).Google Scholar
- (2003) Stochastic Approximation and Recursive Algorithms and Applications, Stochastic Modelling and Applied Probability (Springer, New York).Google Scholar
- (1953) On some asymptotic properties of maximum likelihood estimates and related Bayes’ estimates. Univ. Calif. Publ. Statist. 1:277–330.Google Scholar
- (1984) Convergence of the cyclical relaxation method for linear inequalities. Math. Programming 30:218–228.Crossref, Google Scholar
- (2012) Markov Chains and Stochastic Stability (Springer Science & Business Media, London).Google Scholar
- (1986) A finite algorithm for finding the projection of a point onto the canonical simplex of αn. J. Optim. Theory Appl. 50:195–200.Crossref, Google Scholar
- (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Inc., Red Hoook, NY).Google Scholar
- (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- (1951) A stochastic approximation method. Ann. Math. Statist. 400–407.Crossref, Google Scholar
- (1982) Almost sure approximations to the Robbins-Monro and Kiefer-Wolfowitz processes with dependent noise. Ann. Probab. 10(1):178–187.Crossref, Google Scholar
- (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.Crossref, Google Scholar
- (1971) Diffusion processes with boundary conditions. Comm. Pure Appl. Math. 24(2):147–225.Crossref, Google Scholar
- (1997) Multidimensional Diffusion Processes, vol. 233 (Springer Science & Business Media, New York).Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA; London).Google Scholar
- (1991) Probability with Martingales (Cambridge University Press).Crossref, Google Scholar
- (2025) Code for exponential concentration in stochastic approximation. Accessed January 9, 2025, https://github.com/Shangda-Yang/PSGD.Google Scholar

