Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning

Published Online:https://doi.org/10.1287/moor.2024.0444

References

  • [1] Anastasiou A, Balasubramanian K, Erdogdu MA (2019) Normal approximation for stochastic gradient descent via non-asymptotic rates of martingale CLT. Proc. 32nd Conf. Learn. Theory (PMLR, New York), 115–137.Google Scholar
  • [2] Arnold L (1974) Stochastic Differential Equations (John Wiley & Sons, Inc., New York).Google Scholar
  • [3] Asmussen S (2003) Applied Probability and Queues (Springer-Verlag, New York).Google Scholar
  • [4] Barbour AD (1990) Stein’s method for diffusion approximations. Probab. Theory Related Fields 84(3):297–322.CrossrefGoogle Scholar
  • [5] Benveniste A, Métivier M, Priouret P (2012) Adaptive Algorithms and Stochastic Approximations, Applications of Mathematics, vol. 22 (Springer, Berlin, Heidelberg).Google Scholar
  • [6] Bertsekas D, Tsitsiklis JN (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
  • [7] Bhandari J, Russo D, Singal R (2018) A finite time analysis of temporal difference learning with linear function approximation. Proc. 32nd Conf. Learn. Theory, vol. 75 (PMLR, New York), 1691–1692.Google Scholar
  • [8] Bolthausen E (1982) Exact convergence rates in some martingale central limit theorems. Ann. Probab. 10(3):672–688.CrossrefGoogle Scholar
  • [9] Bonis T (2020) Stein’s method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem. Probab. Theory Related Fields 178(3–4):827–860.CrossrefGoogle Scholar
  • [10] Borkar VS (2009) Stochastic Approximation: A Dynamical Systems Viewpoint, Texts and Readings in Mathematics, vol. 48 (Hindustan Book Agency, Gurgaon, India).Google Scholar
  • [11] Borkar V, Chen S, Devraj A, Kontoyiannis I, Meyn S (2021) The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning. Preprint, submitted October 27, https://arxiv.org/abs/2110.14427.Google Scholar
  • [12] Chatterjee S (2014) A short survey of Stein’s method. Preprint, submitted April 4, https://arxiv.org/abs/1404.1392.Google Scholar
  • [13] Chen LHY, Goldstein L, Shao Q-M (2010) Normal Approximation by Stein’s Method, Probability and Its Applications (Springer, Berlin, Heidelberg).Google Scholar
  • [14] Chen Z, Maguluri ST, Zubeldia M (2023) Concentration of contractive stochastic approximation: Additive and multiplicative noise. Preprint, submitted March 28, https://arxiv.org/abs/2303.15740.Google Scholar
  • [15] Chen Z, Maguluri ST, Shakkottai S, Shanmugam K (2023) A Lyapunov theory for finite-sample guarantees of Markovian stochastic approximation. Oper. Res. 72(4):945–979.Google Scholar
  • [16] Courtade TA, Fathi M, Pananjady A (2019) Existence of Stein kernels under a spectral gap, and discrepancy bounds. Ann. L’Institut Henri Poincaré Probab. Statist. 55(2):777–790.Google Scholar
  • [17] Dieuleveut A, Durmus A, Bach F (2020) Bridging the gap between constant step size stochastic gradient descent and Markov chains. Ann. Statist. 48(3):1348–1382.CrossrefGoogle Scholar
  • [18] Doan TT (2021) Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation. SIAM J. Control Optim. 59(4):2798–2819.CrossrefGoogle Scholar
  • [19] Doan TT (2022) Nonlinear two-time-scale stochastic approximation convergence and finite-time performance. IEEE Trans. Automatic Control 68(8):4695–4705.CrossrefGoogle Scholar
  • [20] Douc R, Moulines E, Priouret P, Soulier P (2018) Markov Chains, Springer Series in Operations Research and Financial Engineering (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • [21] Durmus A, Moulines E, Naumov A, Samsonov S (2025) Finite-time high-probability bounds for Polyak-Ruppert averaged iterates of linear stochastic approximation. Ann. Appl. Probab. 35(2):936–982.Google Scholar
  • [22] Fang X, Shao Q-M, Xu L (2019) Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Related Fields 174:945–979.CrossrefGoogle Scholar
  • [23] Gallouët T, Mijoule G, Swan Y (2018) Regularity of solutions of the Stein equation and rates in the multivariate central limit theorem. Preprint, submitted May 4, https://arxiv.org/abs/1805.01720.Google Scholar
  • [24] Glynn PW, Infanger A (2023) Solution representations for Poisson’s equation, martingale structure, and the Markov chain central limit theorem. Stochastic Systems 14(1):47–68.LinkGoogle Scholar
  • [25] Glynn PW, Meyn SP (1996) A Liapounov bound for solutions of the Poisson equation. Ann. Probab. 24(2):916–931.CrossrefGoogle Scholar
  • [26] Gotze F (1991) On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2):724–739.CrossrefGoogle Scholar
  • [27] Gupta H, Srikant R, Ying L (2019) Finite-time performance bounds and adaptive learning rate selection for two time-scale reinforcement learning. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4704–4713.Google Scholar
  • [28] Hajek B (1982) Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3):502–525.CrossrefGoogle Scholar
  • [29] Hu B, Syed U (2019) Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 8479–8490.Google Scholar
  • [30] Hu J, Doshi V, Young Eun D (2024) Central limit theorem for two-timescale stochastic approximation with Markovian noise: Theory and applications. Preprint, submitted January 17, https://arxiv.org/abs/2401.09339.Google Scholar
  • [31] Huo D, Chen Y, Xie Q (2023) Bias and extrapolation in Markovian linear stochastic approximation with constant stepsizes. SIGMETRICS‘23: Abstract Proc. 2023 ACM SIGMETRICS Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 81–82.Google Scholar
  • [32] Jain P, Kakade SM, Kidambi R, Netrapalli P, Sidford A (2018) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learn. Res. 18(223):1–42.Google Scholar
  • [33] Kontoyiannis I, Meyn SP (2003) Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13(1):304–362.CrossrefGoogle Scholar
  • [34] Kushner HJ, Yin G (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer, New York).Google Scholar
  • [35] Lauand CK, Meyn S (2023) The curse of memory in stochastic approximation. Proc. IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 7803–7809.Google Scholar
  • [36] Li X, Liang J, Zhang Z (2023) Online statistical inference for nonlinear stochastic approximation with Markovian data. Preprint, submitted February 15, https://arxiv.org/abs/2302.07690.Google Scholar
  • [37] Li X, Yang W, Liang J, Zhang Z, Jordan MI (2023) A statistical analysis of Polyak-Ruppert averaged Q-learning. Proc. 26th Internat. Conf. Artificial Intelligence Statist. (PMLR), 2207–2261.Google Scholar
  • [38] Makowski AM, Shwartz A (2002) The Poisson equation for countable Markov chains: Probabilistic methods and interpretations. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes: Methods and Applications, International Series in Operations Research & Management Science, vol. 40 (Springer, Boston), 269–303.CrossrefGoogle Scholar
  • [39] Metivier M, Priouret P (1984) Applications of a Kushner and Clark lemma to general classes of stochastic algorithms. IEEE Trans. Inform. Theory 30(2):140–151.CrossrefGoogle Scholar
  • [40] Meyn SP, Tweedie RL (2012) Markov Chains and Stochastic Stability, Communications and Control Engineering Series (Springer, London).Google Scholar
  • [41] Mou W, Pananjady A, Wainwright MJ, Bartlett PL (2021) Optimal and instance-dependent guarantees for Markovian linear stochastic approximation. Preprint, submitted December 23, https://arxiv.org/abs/2112.12770.Google Scholar
  • [42] Moulines E, Bach F (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. NIPS’11: Proc. 25th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 451–459.Google Scholar
  • [43] Nagaraj D, Wu X, Bresler G, Jain P, Netrapalli P (2020) Least squares regression with Markovian data: Fundamental limits and algorithms. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 16666–16676.Google Scholar
  • [44] Polyak BT (1990) A new method of stochastic approximation type. Avtomatika i Telemekhanika 7:98–107.Google Scholar
  • [45] Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.CrossrefGoogle Scholar
  • [46] Popov NN (1977) Conditions for geometric ergodicity of countable Markov chains. Doklady Akademii Nauk SSSR 234(2):316–319. Google Scholar
  • [47] Röllin A (2018) On quantitative bounds in the mean martingale central limit theorem. Statist. Probab. Lett. 138:171–176.CrossrefGoogle Scholar
  • [48] Ross N (2011) Fundamentals of Stein’s method. Probab. Surveys 8:210–293.CrossrefGoogle Scholar
  • [49] Ross SM (2013) Applied Probability Models with Optimization Applications (Dover Publications, New York).Google Scholar
  • [50] Roy A, Balasubramanian K (2023) Online covariance estimation for stochastic gradient descent under Markovian sampling. Preprint, submitted August 3, https://arxiv.org/abs/2308.01481.Google Scholar
  • [51] Ruppert D (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical report, Cornell University School of Operations Research and Industrial Engineering, Ithaca, NY.Google Scholar
  • [52] Spieksma FM, Tweedie RL (1994) Strengthening ergodicity to geometric ergodicity for Markov chains. Stochastic Models 10(1):45–74.CrossrefGoogle Scholar
  • [53] Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Proc. 32nd Conf. Learn. Theory, vol. 99 (PMLR, New York), 2803–2830.Google Scholar
  • [54] Stein C (1972) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Le Cam LM, Neyman J, Scott EL, eds. Proc. 6th Berkeley Sympos. Math. Statist. Probab. (University of California Press, Berkeley, CA), 583–603.Google Scholar
  • [55] Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • [56] Telgarsky M (2022) Stochastic linear optimization never overfits with quadratically-bounded losses on general data. Proc. 35th Conf. Learn. Theory, vol. 178 (PMLR, New York), 5453–5488.Google Scholar
  • [57] Tsitsiklis J, Van Roy B (1996) Analysis of temporal-difference learning with function approximation. Jordan MI, Petsche T, eds. NIPS’96: Proc. 10th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1075–1081.Google Scholar
  • [58] Ul Haque S, Khodadadian S, Maguluri ST (2023) Tight finite time bounds of two-time-scale linear stochastic approximation with Markovian noise. Preprint, submitted December 31, https://arxiv.org/abs/2401.00364.Google Scholar
  • [59] Zhai A (2018) A high-dimensional CLT in W2 distance with near optimal convergence rate. Probab. Theory Related Fields 170:821–845.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.