Exponential Concentration in Stochastic Approximation

Kody J. H. Law
Kody J. H. Law
[email protected]
https://orcid.org/0000-0003-3133-2537
School of Mathematics, University of Manchester, Manchester M13 9PL, United Kingdom
Search for more papers by this author
,
Neil Walton
Corresponding Author
Neil Walton
[email protected]
https://orcid.org/0000-0002-5241-9765
Durham University Business School, Durham DH1 3LB, United Kingdom
Search for more papers by this author
,
Shangda Yang
Shangda Yang
[email protected]
School of Mathematics, University of Manchester, Manchester M13 9PL, United Kingdom
Search for more papers by this author

School of Mathematics, University of Manchester, Manchester M13 9PL, United Kingdom

Search for more papers by this author

Neil Walton

Corresponding Author

Neil Walton

[email protected]

https://orcid.org/0000-0002-5241-9765

Durham University Business School, Durham DH1 3LB, United Kingdom

Search for more papers by this author

Shangda Yang

[email protected]

School of Mathematics, University of Manchester, Manchester M13 9PL, United Kingdom

Search for more papers by this author

Published Online:22 Apr 2025https://doi.org/10.1287/opre.2023.0425

References

Asmussen S (2003) Applied Probability and Queues, vol. 2 (Springer, New York).Google Scholar
Bertsekas D (2015) Convex Optimization Algorithms (Athena Scientific, Belmont, MA).Google Scholar
Bertsimas D, Gamarnik D, Tsitsiklis JN (2001) Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Probab. 11(4):1384–1428.Google Scholar
Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Rev. 60(2):223–311.Crossref, Google Scholar
Bregman LM (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3):200–217.Crossref, Google Scholar
Buche R, Kushner HJ (2002) Rate of convergence for constrained stochastic approximation algorithms. SIAM J. Control Optim. 40(4):1011–1041.Crossref, Google Scholar
Censor Y, Zenios S (1997) Parallel Optimization: Theory, Algorithms, and Applications, Numerical Mathematics and Scientific Computation (Oxford University Press, Oxford, UK).Google Scholar
Chen Z, Mou S, Maguluri ST (2022) Stationary behavior of constant stepsize SGD type algorithms: An asymptotic characterization. Proc. ACM Measurement Anal. Comput. Systems 6(1):1–24.Crossref, Google Scholar
Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.Crossref, Google Scholar
Davis D, Drusvyatskiy D (2019) Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1):207–239.Crossref, Google Scholar
Davis D, Drusvyatskiy D, Charisopoulos V (2019) Stochastic algorithms with geometric step decay converge linearly on sharp functions. Math. Programming 207:145–190.Crossref, Google Scholar
Davis D, Drusvyatskiy D, Jiang L (2023) Asymptotic normality and optimality in nonsmooth stochastic approximation. Ann Statist. 52(4):1485–1508.Google Scholar
Dembo A, Zeitouni O (2009) Large Deviations Techniques and Applications (Springer, New York).Google Scholar
Dos Santos LT (1987) A parallel subgradient projections method for the convex feasibility problem. J. Comput. Appl. Math. 18(3):307–320.Crossref, Google Scholar
Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the l 1-ball for learning in high dimensions. Proc. 25th Internat. Conf. Machine Learn. (ACM, New York), 272–279.Google Scholar
Duchi JC, Ruan F (2021) Asymptotic optimality in stochastic optimization. Ann. Statist. 49(1):21–48.Crossref, Google Scholar
Fabian V (1967) Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist. 38(1):191–200.Crossref, Google Scholar
Fabian V (1968) On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39(4):1327–1332.Crossref, Google Scholar
Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.Crossref, Google Scholar
Hajek B (1982) Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3):502–525.Crossref, Google Scholar
Hájek J (1972) Local asymptotic minimax and admissibility in estimation. Proc. Sixth Berkeley Sympos. Math. Statist. Probab. (Berkeley), vol. 1, 175–194.Google Scholar
Harrison JM, Reiman MI (1981) On the distribution of multidimensional reflected Brownian motion. SIAM J. Appl. Math. 41(2):345–361.Crossref, Google Scholar
Harrison JM, Williams RJ (1987) Multidimensional reflected Brownian motions having exponential stationary distributions. Ann. Probab. 15(1):115–137.Google Scholar
Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.Crossref, Google Scholar
Hazan E, Kale S (2012) Projection-free online learning. 29th Internat. Conf. Machine Learn., ICML 2012 (ACM, New York), 521–528.Google Scholar
Hazan E, Luo H (2016) Variance-reduced and projection-free stochastic optimization. Internat. Conf. Machine Learn. (PMLR, New York), 1263–1271.Google Scholar
Iusem AN, De Pierro AR (1990) On the convergence properties of Hildreth’s quadratic programming algorithm. Math. Programming 47(1–3):37–51.Crossref, Google Scholar
Jaggi M (2013) Revisiting frank-wolfe: Projection-free sparse convex optimization. Internat. Conf. Machine Learn. (PMLR, New York), 427–435.Google Scholar
Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3):462–466.Crossref, Google Scholar
Kingman JFC (1964) A martingale inequality in the theory of queues. Math. Proc. Cambridge Philos. Soc. 60(2):359–361.Crossref, Google Scholar
Kushner H, Clark D (2012) Stochastic Approximation Methods for Constrained and Unconstrained Systems, Applied Mathematical Sciences (Springer, New York).Google Scholar
Kushner H, Yin G (2003) Stochastic Approximation and Recursive Algorithms and Applications, Stochastic Modelling and Applied Probability (Springer, New York).Google Scholar
Le Cam L (1953) On some asymptotic properties of maximum likelihood estimates and related Bayes’ estimates. Univ. Calif. Publ. Statist. 1:277–330.Google Scholar
Mandel J (1984) Convergence of the cyclical relaxation method for linear inequalities. Math. Programming 30:218–228.Crossref, Google Scholar
Meyn SP, Tweedie RL (2012) Markov Chains and Stochastic Stability (Springer Science & Business Media, London).Google Scholar
Michelot C (1986) A finite algorithm for finding the projection of a point onto the canonical simplex of αn. J. Optim. Theory Appl. 50:195–200.Crossref, Google Scholar
Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Inc., Red Hoook, NY).Google Scholar
Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 400–407.Crossref, Google Scholar
Ruppert D (1982) Almost sure approximations to the Robbins-Monro and Kiefer-Wolfowitz processes with dependent noise. Ann. Probab. 10(1):178–187.Crossref, Google Scholar
Shapiro A (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.Crossref, Google Scholar
Stroock D, Varadhan S (1971) Diffusion processes with boundary conditions. Comm. Pure Appl. Math. 24(2):147–225.Crossref, Google Scholar
Stroock DW, Varadhan SS (1997) Multidimensional Diffusion Processes, vol. 233 (Springer Science & Business Media, New York).Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA; London).Google Scholar
Williams D (1991) Probability with Martingales (Cambridge University Press).Crossref, Google Scholar
Yang S (2025) Code for exponential concentration in stochastic approximation. Accessed January 9, 2025, https://github.com/Shangda-Yang/PSGD.Google Scholar

Volume 74, Issue 1

January-February 2026

Pages iii-vii, 1-571, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:August 03, 2023
Accepted:February 10, 2025
Published Online:April 22, 2025

Cite as

Kody J. H. Law, Neil Walton, Shangda Yang (2025) Exponential Concentration in Stochastic Approximation. Operations Research 74(1):446-463.

https://doi.org/10.1287/opre.2023.0425

Keywords

Acknowledgements

The authors thank the reviewers for their help in improving this paper. The authors are particularly grateful to a referee for the reference Davis et al. (2019), which led to Theorem 5. Also, thanks to Stavros Zenios for advice on projection algorithms and Matthias Troffaes for help with pycddlib.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Exponential Concentration in Stochastic Approximation

References

Volume 74, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News