Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation

Alain Durmus
Alain Durmus
[email protected]
Department of Applied Mathematics, Ecole Polytechnique, 91128 Palaiseau, France
Search for more papers by this author
,
Eric Moulines
Corresponding Author
Eric Moulines
[email protected]
https://orcid.org/0000-0002-2058-0693
Department of Applied Mathematics, Ecole Polytechnique, 91128 Palaiseau, France
Search for more papers by this author
,
Alexey Naumov
Alexey Naumov
[email protected]
https://orcid.org/0000-0002-7536-4576
HSE University and Steklov Mathematical Institute of Russian Academy of Sciences, Moscow 101000, Russian Federation
Search for more papers by this author
,
Sergey Samsonov
Sergey Samsonov
[email protected]
https://orcid.org/0000-0002-0203-2028
HSE University and Institute for Information Transmission Problems, Moscow 101000, Russian Federation
Search for more papers by this author

Alain Durmus

[email protected]

Department of Applied Mathematics, Ecole Polytechnique, 91128 Palaiseau, France

Search for more papers by this author

Eric Moulines

Corresponding Author

Eric Moulines

[email protected]

https://orcid.org/0000-0002-2058-0693

Department of Applied Mathematics, Ecole Polytechnique, 91128 Palaiseau, France

Search for more papers by this author

Alexey Naumov

[email protected]

https://orcid.org/0000-0002-7536-4576

HSE University and Steklov Mathematical Institute of Russian Academy of Sciences, Moscow 101000, Russian Federation

Search for more papers by this author

Sergey Samsonov

[email protected]

https://orcid.org/0000-0002-0203-2028

HSE University and Institute for Information Transmission Problems, Moscow 101000, Russian Federation

Search for more papers by this author

Published Online:16 Apr 2024https://doi.org/10.1287/moor.2022.0179

References

[1] Aguech R, Moulines E, Priouret P (2000) On a perturbation approach for the analysis of stochastic tracking algorithms. SIAM J. Control Optim. 39(3):872–899.Crossref, Google Scholar
[2] Anderson GD, Qiu SL (1997) A monotonicity property of the gamma function. Proc. Amer. Math. Soc. 125(11):3355–3362.Crossref, Google Scholar
[3] Bach F, Moulines E (2013) Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n). Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 30063–30074.Google Scholar
[4] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, vol. 22 (Springer Science & Business Media, New York).Google Scholar
[5] Bertsekas D (2019) Reinforcement Learning and Optimal Control (Athena Scientific, Nashua, NH).Google Scholar
[6] Bertsekas D, Tsitsiklis J (2015) Parallel and Distributed Computation: Numerical Methods (Athena Scientific, Nashua, NH).Google Scholar
[7] Bhandari J, Russo D, Singal R (2021) A finite time analysis of temporal difference learning with linear function approximation. Oper. Res. 69(3):950–973.Link, Google Scholar
[8] Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[9] Bottou L, Curtis FE, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60(2):223–311.Crossref, Google Scholar
[10] Chen S, Devraj A, Busic A, Meyn S (2020) Explicit mean-square error bounds for Monte-Carlo and linear stochastic approximation. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 4173–4183.Google Scholar
[11] Dalal G, Szorenyi B, Thoppe G (2020) A tale of two-timescale reinforcement learning with the tightest finite-time bound. Proc. Conf. AAAI Artificial Intelligence, vol. 34 (AAAI Press, Palo Alto, CA), 3701–3708.Crossref, Google Scholar
[12] Douc R, Moulines E, Priouret P, Soulier P (2018) Markov Chains, Springer Series in Operations Research and Financial Engineering (Springer, Cham, Switzerland).Crossref, Google Scholar
[13] Durmus A, Moulines E, Naumov A, Samsonov S, Sheshukova M (2023) Rosenthal-type inequalities for linear statistics of Markov chains. Preprint, submitted March 10, https://arxiv.org/abs/2303.05838.Google Scholar
[14] Durmus A, Moulines E, Naumov A, Samsonov S, Wai HT (2021) On the stability of random matrix product with Markovian noise: Application to linear stochastic approximation and TD learning. Belkin M, Kpotufe S, eds. Proc. Thirty-Fourth Conf. Learn. Theory, vol. 134 (PMLR, New York), 1711–1752.Google Scholar
[15] Durmus A, Moulines E, Naumov A, Samsonov S, Scaman K, Wai HT (2021) Tight high probability bounds for linear stochastic approximation with fixed stepsize. Ranzato M, Beygelzimer A, Nguyen K, Liang PS, Vaughan JW, Dauphin Y, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates, Inc., Red Hook, NY), 30063–30074.Google Scholar
[16] Eweda E, Macchi O (1983) Quadratic mean and almost-sure convergence of unbounded stochastic approximation algorithms with correlated observations. Ann. l’Institut Henri Poincare (B) Probab. Statist. 19(3):235–255.Google Scholar
[17] Guo L, Ljung L (1995) Exponential stability of general tracking algorithms. IEEE Trans. Automatic Control 40(8):1376–1387.Crossref, Google Scholar
[18] Hiai F, Petz D (2014) Introduction to Matrix Analysis and Applications (Springer International Publishing, Cham, Switzerland).Crossref, Google Scholar
[19] Huang D, Niles-Weed J, Tropp JA, Ward R (2021) Matrix concentration for products. Foundations Comput. Math. 22(6):1–33.Google Scholar
[20] Jain P, Nagaraj D, Netrapalli P (2019) Making the last iterate of sgd information theoretically optimal. Beygelzimer A, Hsu D, eds. Proc. Thirty-Second Conf. Learn. Theory, vol. 99 (PMLR, New York), 1752–1755.Google Scholar
[21] Jain P, Kakade S, Kidambi R, Netrapalli P, Sidford A (2018) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learn. Res. 18(2018):1–42.Google Scholar
[22] Jain P, Kakade SM, Kidambi R, Netrapalli P, Sidford A (2018) Accelerating stochastic gradient descent for least squares regression. Proc. 31st Conf. Learn. Theory (PMLR, New York), 545–604.Google Scholar
[23] Joulin A, Ollivier Y (2010) Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38(6):2418–2442.Crossref, Google Scholar
[24] Kushner H, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications, vol. 35 (Springer Science & Business Media, New York).Google Scholar
[25] Lakshminarayanan C, Szepesvari C (2018) Linear stochastic approximation: How far does constant step-size and iterate averaging go? Storkey A, Perez-Cruz F, eds. Proc. Twenty-First Internat. Conf. Artificial Intelligence Statist., vol. 84 (PMLR, New York), 1347–1355.Google Scholar
[26] Lauand CK, Meyn S (2022) Bias in stochastic approximation cannot be eliminated with averaging. 2022 58th Annual Allerton Conf. Comm. Control Comput. (Allerton) (IEEE Press, Piscataway, NJ), 1–4.Google Scholar
[27] Ljung L (2002) Recursive identification algorithms. Circuits Systems Signal Processing 21(1):57–68.Crossref, Google Scholar
[28] Mou W, Pananjady A, Wainwright MJ, Bartlett PL (2021) Optimal and instance-dependent guarantees for Markovian linear stochastic approximation. Preprint, submitted December 23, https://arxiv.org/abs/2112.12770.Google Scholar
[29] Mou W, Li CJ, Wainwright MJ, Bartlett PL, Jordan MI (2020) On linear stochastic approximation: Fine-grained Polyak–Ruppert and non-asymptotic concentration. Proc. Thirty Third Conf. Learn. Theory, vol. 125 (PMLR, New York), 2947–2997.Google Scholar
[30] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
[31] Osekowski A (2012) Sharp Martingale and Semimartingale Inequalities, 1st ed., Monografie Matematyczne 72 (Birkhäuser, Basel, Switzerland).Crossref, Google Scholar
[32] Paulin D (2015) Concentration inequalities for Markov chains by Marton couplings and spectral methods. Electronic J. Probab. 20(2015):1–32.Google Scholar
[33] Pinelis I (1994) Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probab. 22(4):1679–1706.Crossref, Google Scholar
[34] Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
[35] Rakhlin A, Shamir O, Sridharan K (2012) Making gradient descent optimal for strongly convex stochastic optimization. Proc. 29th Internat. Conf. Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 1571–1578.Google Scholar
[36] Rio E (2017) Asymptotic Theory of Weakly Dependent Random Processes, vol. 80 (Springer, Cham, Switzerland).Crossref, Google Scholar
[37] Ruppert D (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical report, Cornell University Operations Research and Industrial Engineering, Ithaca, NY.Google Scholar
[38] Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Proc. Thirty-Second Conf. Learn. Theory (PMLR, New York), 2803–2830.Google Scholar
[39] Sutton RS (1988) Learning to predict by the methods of temporal differences. Machine Learn. 3(1):9–44.Crossref, Google Scholar
[40] Vershynin R (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[41] Watkins CJ, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar
[42] Widrow B, Stearns SD (1985) Adaptive Signal Processing (Prentice-Hall, Englewood Cliffs, NJ).Google Scholar

cover image Mathematics of Operations Research

Volume 50, Issue 2

May 2025

Pages iii, 783-1583

Article Information

Metrics

Information

Received:July 05, 2022
Accepted:January 12, 2024
Published Online:April 16, 2024

Cite as

Alain Durmus; , Eric Moulines; , Alexey Naumov; , Sergey Samsonov (2024) Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation. Mathematics of Operations Research 50(2):935-964.

https://doi.org/10.1287/moor.2022.0179

Keywords

Acknowledgments

Part of the work was prepared under the auspices of the Lagrange Center for Mathematics and Computing.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Finite-Time High-Probability Bounds for Polyak–Ruppert Averaged Iterates of Linear Stochastic Approximation

References

Volume 50, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News