Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

R. Srikant
R. Srikant
[email protected]
https://orcid.org/0000-0003-1483-5204
Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois 61801; and Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
Search for more papers by this author

Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois 61801; and Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801

Search for more papers by this author

Published Online:3 Oct 2025https://doi.org/10.1287/moor.2024.0444

References

[1] Anastasiou A, Balasubramanian K, Erdogdu MA (2019) Normal approximation for stochastic gradient descent via non-asymptotic rates of martingale CLT. Proc. 32nd Conf. Learn. Theory (PMLR, New York), 115–137.Google Scholar
[2] Arnold L (1974) Stochastic Differential Equations (John Wiley & Sons, Inc., New York).Google Scholar
[3] Asmussen S (2003) Applied Probability and Queues (Springer-Verlag, New York).Google Scholar
[4] Barbour AD (1990) Stein’s method for diffusion approximations. Probab. Theory Related Fields 84(3):297–322.Crossref, Google Scholar
[5] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, Applications of Mathematics, vol. 22 (Springer, Berlin, Heidelberg).Google Scholar
[6] Bertsekas D, Tsitsiklis JN (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
[7] Bhandari J, Russo D, Singal R (2018) A finite time analysis of temporal difference learning with linear function approximation. Proc. 32nd Conf. Learn. Theory, vol. 75 (PMLR, New York), 1691–1692.Google Scholar
[8] Bolthausen E (1982) Exact convergence rates in some martingale central limit theorems. Ann. Probab. 10(3):672–688.Crossref, Google Scholar
[9] Bonis T (2020) Stein’s method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem. Probab. Theory Related Fields 178(3–4):827–860.Crossref, Google Scholar
[10] Borkar VS (2009) Stochastic Approximation: A Dynamical Systems Viewpoint, Texts and Readings in Mathematics, vol. 48 (Hindustan Book Agency, Gurgaon, India).Google Scholar
[11] Borkar V, Chen S, Devraj A, Kontoyiannis I, Meyn S (2021) The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning. Preprint, submitted October 27, https://arxiv.org/abs/2110.14427.Google Scholar
[12] Chatterjee S (2014) A short survey of Stein’s method. Preprint, submitted April 4, https://arxiv.org/abs/1404.1392.Google Scholar
[13] Chen LHY, Goldstein L, Shao Q-M (2010) Normal Approximation by Stein’s Method, Probability and Its Applications (Springer, Berlin, Heidelberg).Google Scholar
[14] Chen Z, Maguluri ST, Zubeldia M (2023) Concentration of contractive stochastic approximation: Additive and multiplicative noise. Preprint, submitted March 28, https://arxiv.org/abs/2303.15740.Google Scholar
[15] Chen Z, Maguluri ST, Shakkottai S, Shanmugam K (2023) A Lyapunov theory for finite-sample guarantees of Markovian stochastic approximation. Oper. Res. 72(4):945–979.Google Scholar
[16] Courtade TA, Fathi M, Pananjady A (2019) Existence of Stein kernels under a spectral gap, and discrepancy bounds. Ann. L’Institut Henri Poincaré Probab. Statist. 55(2):777–790.Google Scholar
[17] Dieuleveut A, Durmus A, Bach F (2020) Bridging the gap between constant step size stochastic gradient descent and Markov chains. Ann. Statist. 48(3):1348–1382.Crossref, Google Scholar
[18] Doan TT (2021) Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation. SIAM J. Control Optim. 59(4):2798–2819.Crossref, Google Scholar
[19] Doan TT (2022) Nonlinear two-time-scale stochastic approximation convergence and finite-time performance. IEEE Trans. Automatic Control 68(8):4695–4705.Crossref, Google Scholar
[20] Douc R, Moulines E, Priouret P, Soulier P (2018) Markov Chains, Springer Series in Operations Research and Financial Engineering (Springer, Cham, Switzerland).Crossref, Google Scholar
[21] Durmus A, Moulines E, Naumov A, Samsonov S (2025) Finite-time high-probability bounds for Polyak-Ruppert averaged iterates of linear stochastic approximation. Ann. Appl. Probab. 35(2):936–982.Google Scholar
[22] Fang X, Shao Q-M, Xu L (2019) Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Related Fields 174:945–979.Crossref, Google Scholar
[23] Gallouët T, Mijoule G, Swan Y (2018) Regularity of solutions of the Stein equation and rates in the multivariate central limit theorem. Preprint, submitted May 4, https://arxiv.org/abs/1805.01720.Google Scholar
[24] Glynn PW, Infanger A (2023) Solution representations for Poisson’s equation, martingale structure, and the Markov chain central limit theorem. Stochastic Systems 14(1):47–68.Link, Google Scholar
[25] Glynn PW, Meyn SP (1996) A Liapounov bound for solutions of the Poisson equation. Ann. Probab. 24(2):916–931.Crossref, Google Scholar
[26] Gotze F (1991) On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2):724–739.Crossref, Google Scholar
[27] Gupta H, Srikant R, Ying L (2019) Finite-time performance bounds and adaptive learning rate selection for two time-scale reinforcement learning. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4704–4713.Google Scholar
[28] Hajek B (1982) Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3):502–525.Crossref, Google Scholar
[29] Hu B, Syed U (2019) Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 8479–8490.Google Scholar
[30] Hu J, Doshi V, Young Eun D (2024) Central limit theorem for two-timescale stochastic approximation with Markovian noise: Theory and applications. Preprint, submitted January 17, https://arxiv.org/abs/2401.09339.Google Scholar
[31] Huo D, Chen Y, Xie Q (2023) Bias and extrapolation in Markovian linear stochastic approximation with constant stepsizes. SIGMETRICS‘23: Abstract Proc. 2023 ACM SIGMETRICS Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 81–82.Google Scholar
[32] Jain P, Kakade SM, Kidambi R, Netrapalli P, Sidford A (2018) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learn. Res. 18(223):1–42.Google Scholar
[33] Kontoyiannis I, Meyn SP (2003) Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13(1):304–362.Crossref, Google Scholar
[34] Kushner HJ, Yin G (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer, New York).Google Scholar
[35] Lauand CK, Meyn S (2023) The curse of memory in stochastic approximation. Proc. IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 7803–7809.Google Scholar
[36] Li X, Liang J, Zhang Z (2023) Online statistical inference for nonlinear stochastic approximation with Markovian data. Preprint, submitted February 15, https://arxiv.org/abs/2302.07690.Google Scholar
[37] Li X, Yang W, Liang J, Zhang Z, Jordan MI (2023) A statistical analysis of Polyak-Ruppert averaged Q-learning. Proc. 26th Internat. Conf. Artificial Intelligence Statist. (PMLR), 2207–2261.Google Scholar
[38] Makowski AM, Shwartz A (2002) The Poisson equation for countable Markov chains: Probabilistic methods and interpretations. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes: Methods and Applications, International Series in Operations Research & Management Science, vol. 40 (Springer, Boston), 269–303.Crossref, Google Scholar
[39] Metivier M, Priouret P (1984) Applications of a Kushner and Clark lemma to general classes of stochastic algorithms. IEEE Trans. Inform. Theory 30(2):140–151.Crossref, Google Scholar
[40] Meyn SP, Tweedie RL (2012) Markov Chains and Stochastic Stability, Communications and Control Engineering Series (Springer, London).Google Scholar
[41] Mou W, Pananjady A, Wainwright MJ, Bartlett PL (2021) Optimal and instance-dependent guarantees for Markovian linear stochastic approximation. Preprint, submitted December 23, https://arxiv.org/abs/2112.12770.Google Scholar
[42] Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. NIPS’11: Proc. 25th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 451–459.Google Scholar
[43] Nagaraj D, Wu X, Bresler G, Jain P, Netrapalli P (2020) Least squares regression with Markovian data: Fundamental limits and algorithms. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 16666–16676.Google Scholar
[44] Polyak BT (1990) A new method of stochastic approximation type. Avtomatika i Telemekhanika 7:98–107.Google Scholar
[45] Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
[46] Popov NN (1977) Conditions for geometric ergodicity of countable Markov chains. Doklady Akademii Nauk SSSR 234(2):316–319. Google Scholar
[47] Röllin A (2018) On quantitative bounds in the mean martingale central limit theorem. Statist. Probab. Lett. 138:171–176.Crossref, Google Scholar
[48] Ross N (2011) Fundamentals of Stein’s method. Probab. Surveys 8:210–293.Crossref, Google Scholar
[49] Ross SM (2013) Applied Probability Models with Optimization Applications (Dover Publications, New York).Google Scholar
[50] Roy A, Balasubramanian K (2023) Online covariance estimation for stochastic gradient descent under Markovian sampling. Preprint, submitted August 3, https://arxiv.org/abs/2308.01481.Google Scholar
[51] Ruppert D (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical report, Cornell University School of Operations Research and Industrial Engineering, Ithaca, NY.Google Scholar
[52] Spieksma FM, Tweedie RL (1994) Strengthening ergodicity to geometric ergodicity for Markov chains. Stochastic Models 10(1):45–74.Crossref, Google Scholar
[53] Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Proc. 32nd Conf. Learn. Theory, vol. 99 (PMLR, New York), 2803–2830.Google Scholar
[54] Stein C (1972) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Le Cam LM, Neyman J, Scott EL, eds. Proc. 6th Berkeley Sympos. Math. Statist. Probab. (University of California Press, Berkeley, CA), 583–603.Google Scholar
[55] Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
[56] Telgarsky M (2022) Stochastic linear optimization never overfits with quadratically-bounded losses on general data. Proc. 35th Conf. Learn. Theory, vol. 178 (PMLR, New York), 5453–5488.Google Scholar
[57] Tsitsiklis J, Van Roy B (1996) Analysis of temporal-difference learning with function approximation. Jordan MI, Petsche T, eds. NIPS’96: Proc. 10th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1075–1081.Google Scholar
[58] Ul Haque S, Khodadadian S, Maguluri ST (2023) Tight finite time bounds of two-time-scale linear stochastic approximation with Markovian noise. Preprint, submitted December 31, https://arxiv.org/abs/2401.00364.Google Scholar
[59] Zhai A (2018) A high-dimensional CLT in W2 distance with near optimal convergence rate. Probab. Theory Related Fields 170:821–845.Crossref, Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:March 12, 2024
Accepted:June 01, 2025
Published Online:October 03, 2025

Cite as

R. Srikant (2025) Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0444

Keywords

Acknowledgments

The author thanks Professor Siva Theja Maguluri, Georgia Tech, for many stimulating discussions and drawing his attention to several references during the course of this work; Professor Sean Meyn, University of Florida, for pointing out the bias issue with fixed step-size TD learning and why it cannot be eliminated by averaging and other discussions; Professor Mehrdad Moharrami, University of Iowa, for carefully reading an earlier version of the paper and providing many useful comments; and Weichen Wu, CMU, for pointing out a missing factor in Theorem 4 in an earlier version of the paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News