Exponential Concentration in Stochastic Approximation

Published Online:https://doi.org/10.1287/opre.2023.0425

References

  • Asmussen S (2003) Applied Probability and Queues, vol. 2 (Springer, New York).Google Scholar
  • Bertsekas D (2015) Convex Optimization Algorithms (Athena Scientific, Belmont, MA).Google Scholar
  • Bertsimas D, Gamarnik D, Tsitsiklis JN (2001) Performance of multiclass Markovian queueing networks via piecewise linear Lyapunov functions. Ann. Appl. Probab. 11(4):1384–1428.Google Scholar
  • Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. Siam Rev. 60(2):223–311.CrossrefGoogle Scholar
  • Bregman LM (1967) The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys. 7(3):200–217.CrossrefGoogle Scholar
  • Buche R, Kushner HJ (2002) Rate of convergence for constrained stochastic approximation algorithms. SIAM J. Control Optim. 40(4):1011–1041.CrossrefGoogle Scholar
  • Censor Y, Zenios S (1997) Parallel Optimization: Theory, Algorithms, and Applications, Numerical Mathematics and Scientific Computation (Oxford University Press, Oxford, UK).Google Scholar
  • Chen Z, Mou S, Maguluri ST (2022) Stationary behavior of constant stepsize SGD type algorithms: An asymptotic characterization. Proc. ACM Measurement Anal. Comput. Systems 6(1):1–24.CrossrefGoogle Scholar
  • Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.CrossrefGoogle Scholar
  • Davis D, Drusvyatskiy D (2019) Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1):207–239.CrossrefGoogle Scholar
  • Davis D, Drusvyatskiy D, Charisopoulos V (2019) Stochastic algorithms with geometric step decay converge linearly on sharp functions. Math. Programming 207:145–190.CrossrefGoogle Scholar
  • Davis D, Drusvyatskiy D, Jiang L (2023) Asymptotic normality and optimality in nonsmooth stochastic approximation. Ann Statist. 52(4):1485–1508.Google Scholar
  • Dembo A, Zeitouni O (2009) Large Deviations Techniques and Applications (Springer, New York).Google Scholar
  • Dos Santos LT (1987) A parallel subgradient projections method for the convex feasibility problem. J. Comput. Appl. Math. 18(3):307–320.CrossrefGoogle Scholar
  • Duchi J, Shalev-Shwartz S, Singer Y, Chandra T (2008) Efficient projections onto the l 1-ball for learning in high dimensions. Proc. 25th Internat. Conf. Machine Learn. (ACM, New York), 272–279.Google Scholar
  • Duchi JC, Ruan F (2021) Asymptotic optimality in stochastic optimization. Ann. Statist. 49(1):21–48.CrossrefGoogle Scholar
  • Fabian V (1967) Stochastic approximation of minima with improved asymptotic speed. Ann. Math. Statist. 38(1):191–200.CrossrefGoogle Scholar
  • Fabian V (1968) On asymptotic normality in stochastic approximation. Ann. Math. Statist. 39(4):1327–1332.CrossrefGoogle Scholar
  • Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.CrossrefGoogle Scholar
  • Hajek B (1982) Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3):502–525.CrossrefGoogle Scholar
  • Hájek J (1972) Local asymptotic minimax and admissibility in estimation. Proc. Sixth Berkeley Sympos. Math. Statist. Probab. (Berkeley), vol. 1, 175–194.Google Scholar
  • Harrison JM, Reiman MI (1981) On the distribution of multidimensional reflected Brownian motion. SIAM J. Appl. Math. 41(2):345–361.CrossrefGoogle Scholar
  • Harrison JM, Williams RJ (1987) Multidimensional reflected Brownian motions having exponential stationary distributions. Ann. Probab. 15(1):115–137.Google Scholar
  • Hazan E (2016) Introduction to online convex optimization. Foundations Trends Optim. 2(3–4):157–325.CrossrefGoogle Scholar
  • Hazan E, Kale S (2012) Projection-free online learning. 29th Internat. Conf. Machine Learn., ICML 2012 (ACM, New York), 521–528.Google Scholar
  • Hazan E, Luo H (2016) Variance-reduced and projection-free stochastic optimization. Internat. Conf. Machine Learn. (PMLR, New York), 1263–1271.Google Scholar
  • Iusem AN, De Pierro AR (1990) On the convergence properties of Hildreth’s quadratic programming algorithm. Math. Programming 47(1–3):37–51.CrossrefGoogle Scholar
  • Jaggi M (2013) Revisiting frank-wolfe: Projection-free sparse convex optimization. Internat. Conf. Machine Learn. (PMLR, New York), 427–435.Google Scholar
  • Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3):462–466.CrossrefGoogle Scholar
  • Kingman JFC (1964) A martingale inequality in the theory of queues. Math. Proc. Cambridge Philos. Soc. 60(2):359–361.CrossrefGoogle Scholar
  • Kushner H, Clark D (2012) Stochastic Approximation Methods for Constrained and Unconstrained Systems, Applied Mathematical Sciences (Springer, New York).Google Scholar
  • Kushner H, Yin G (2003) Stochastic Approximation and Recursive Algorithms and Applications, Stochastic Modelling and Applied Probability (Springer, New York).Google Scholar
  • Le Cam L (1953) On some asymptotic properties of maximum likelihood estimates and related Bayes’ estimates. Univ. Calif. Publ. Statist. 1:277–330.Google Scholar
  • Mandel J (1984) Convergence of the cyclical relaxation method for linear inequalities. Math. Programming 30:218–228.CrossrefGoogle Scholar
  • Meyn SP, Tweedie RL (2012) Markov Chains and Stochastic Stability (Springer Science & Business Media, London).Google Scholar
  • Michelot C (1986) A finite algorithm for finding the projection of a point onto the canonical simplex of αn. J. Optim. Theory Appl. 50:195–200.CrossrefGoogle Scholar
  • Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates, Inc., Red Hoook, NY).Google Scholar
  • Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 400–407.CrossrefGoogle Scholar
  • Ruppert D (1982) Almost sure approximations to the Robbins-Monro and Kiefer-Wolfowitz processes with dependent noise. Ann. Probab. 10(1):178–187.CrossrefGoogle Scholar
  • Shapiro A (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.CrossrefGoogle Scholar
  • Stroock D, Varadhan S (1971) Diffusion processes with boundary conditions. Comm. Pure Appl. Math. 24(2):147–225.CrossrefGoogle Scholar
  • Stroock DW, Varadhan SS (1997) Multidimensional Diffusion Processes, vol. 233 (Springer Science & Business Media, New York).Google Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA; London).Google Scholar
  • Williams D (1991) Probability with Martingales (Cambridge University Press).CrossrefGoogle Scholar
  • Yang S (2025) Code for exponential concentration in stochastic approximation. Accessed January 9, 2025, https://github.com/Shangda-Yang/PSGD.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.