Rates of Convergence in the Central Limit Theorem for Markov Chains, with an Application to TD Learning
References
- [1] (2019) Normal approximation for stochastic gradient descent via non-asymptotic rates of martingale CLT. Proc. 32nd Conf. Learn. Theory (PMLR, New York), 115–137.Google Scholar
- [2] (1974) Stochastic Differential Equations (John Wiley & Sons, Inc., New York).Google Scholar
- [3] (2003) Applied Probability and Queues (Springer-Verlag, New York).Google Scholar
- [4] (1990) Stein’s method for diffusion approximations. Probab. Theory Related Fields 84(3):297–322.Crossref, Google Scholar
- [5] (2012) Adaptive Algorithms and Stochastic Approximations, Applications of Mathematics, vol. 22 (Springer, Berlin, Heidelberg).Google Scholar
- [6] (1996) Neuro-Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- [7] (2018) A finite time analysis of temporal difference learning with linear function approximation. Proc. 32nd Conf. Learn. Theory, vol. 75 (PMLR, New York), 1691–1692.Google Scholar
- [8] (1982) Exact convergence rates in some martingale central limit theorems. Ann. Probab. 10(3):672–688.Crossref, Google Scholar
- [9] (2020) Stein’s method for normal approximation in Wasserstein distances with application to the multivariate central limit theorem. Probab. Theory Related Fields 178(3–4):827–860.Crossref, Google Scholar
- [10] (2009) Stochastic Approximation: A Dynamical Systems Viewpoint, Texts and Readings in Mathematics, vol. 48 (Hindustan Book Agency, Gurgaon, India).Google Scholar
- [11] (2021) The ODE method for asymptotic statistics in stochastic approximation and reinforcement learning. Preprint, submitted October 27, https://arxiv.org/abs/2110.14427.Google Scholar
- [12] (2014) A short survey of Stein’s method. Preprint, submitted April 4, https://arxiv.org/abs/1404.1392.Google Scholar
- [13] (2010) Normal Approximation by Stein’s Method, Probability and Its Applications (Springer, Berlin, Heidelberg).Google Scholar
- [14] (2023) Concentration of contractive stochastic approximation: Additive and multiplicative noise. Preprint, submitted March 28, https://arxiv.org/abs/2303.15740.Google Scholar
- [15] (2023) A Lyapunov theory for finite-sample guarantees of Markovian stochastic approximation. Oper. Res. 72(4):945–979.Google Scholar
- [16] (2019) Existence of Stein kernels under a spectral gap, and discrepancy bounds. Ann. L’Institut Henri Poincaré Probab. Statist. 55(2):777–790.Google Scholar
- [17] (2020) Bridging the gap between constant step size stochastic gradient descent and Markov chains. Ann. Statist. 48(3):1348–1382.Crossref, Google Scholar
- [18] (2021) Finite-time analysis and restarting scheme for linear two-time-scale stochastic approximation. SIAM J. Control Optim. 59(4):2798–2819.Crossref, Google Scholar
- [19] (2022) Nonlinear two-time-scale stochastic approximation convergence and finite-time performance. IEEE Trans. Automatic Control 68(8):4695–4705.Crossref, Google Scholar
- [20] (2018) Markov Chains, Springer Series in Operations Research and Financial Engineering (Springer, Cham, Switzerland).Crossref, Google Scholar
- [21] (2025) Finite-time high-probability bounds for Polyak-Ruppert averaged iterates of linear stochastic approximation. Ann. Appl. Probab. 35(2):936–982.Google Scholar
- [22] (2019) Multivariate approximations in Wasserstein distance by Stein’s method and Bismut’s formula. Probab. Theory Related Fields 174:945–979.Crossref, Google Scholar
- [23] (2018) Regularity of solutions of the Stein equation and rates in the multivariate central limit theorem. Preprint, submitted May 4, https://arxiv.org/abs/1805.01720.Google Scholar
- [24] (2023) Solution representations for Poisson’s equation, martingale structure, and the Markov chain central limit theorem. Stochastic Systems 14(1):47–68.Link, Google Scholar
- [25] (1996) A Liapounov bound for solutions of the Poisson equation. Ann. Probab. 24(2):916–931.Crossref, Google Scholar
- [26] (1991) On the rate of convergence in the multivariate CLT. Ann. Probab. 19(2):724–739.Crossref, Google Scholar
- [27] (2019) Finite-time performance bounds and adaptive learning rate selection for two time-scale reinforcement learning. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4704–4713.Google Scholar
- [28] (1982) Hitting-time and occupation-time bounds implied by drift analysis with applications. Adv. Appl. Probab. 14(3):502–525.Crossref, Google Scholar
- [29] (2019) Characterizing the exact behaviors of temporal difference learning algorithms using Markov jump linear system theory. Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 8479–8490.Google Scholar
- [30] (2024) Central limit theorem for two-timescale stochastic approximation with Markovian noise: Theory and applications. Preprint, submitted January 17, https://arxiv.org/abs/2401.09339.Google Scholar
- [31] (2023) Bias and extrapolation in Markovian linear stochastic approximation with constant stepsizes. SIGMETRICS‘23: Abstract Proc. 2023 ACM SIGMETRICS Internat. Conf. Measurement Modeling Comput. Systems (Association for Computing Machinery, New York), 81–82.Google Scholar
- [32] (2018) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learn. Res. 18(223):1–42.Google Scholar
- [33] (2003) Spectral theory and limit theorems for geometrically ergodic Markov processes. Ann. Appl. Probab. 13(1):304–362.Crossref, Google Scholar
- [34] (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer, New York).Google Scholar
- [35] (2023) The curse of memory in stochastic approximation. Proc. IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 7803–7809.Google Scholar
- [36] (2023) Online statistical inference for nonlinear stochastic approximation with Markovian data. Preprint, submitted February 15, https://arxiv.org/abs/2302.07690.Google Scholar
- [37] (2023) A statistical analysis of Polyak-Ruppert averaged Q-learning. Proc. 26th Internat. Conf. Artificial Intelligence Statist. (PMLR), 2207–2261.Google Scholar
- [38] (2002) The Poisson equation for countable Markov chains: Probabilistic methods and interpretations. Feinberg EA, Shwartz A, eds. Handbook of Markov Decision Processes: Methods and Applications, International Series in Operations Research & Management Science, vol. 40 (Springer, Boston), 269–303.Crossref, Google Scholar
- [39] (1984) Applications of a Kushner and Clark lemma to general classes of stochastic algorithms. IEEE Trans. Inform. Theory 30(2):140–151.Crossref, Google Scholar
- [40] (2012) Markov Chains and Stochastic Stability, Communications and Control Engineering Series (Springer, London).Google Scholar
- [41] (2021) Optimal and instance-dependent guarantees for Markovian linear stochastic approximation. Preprint, submitted December 23, https://arxiv.org/abs/2112.12770.Google Scholar
- [42] (2011) Non-asymptotic analysis of stochastic approximation algorithms for machine learning. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. NIPS’11: Proc. 25th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 451–459.Google Scholar
- [43] (2020) Least squares regression with Markovian data: Fundamental limits and algorithms. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 16666–16676.Google Scholar
- [44] (1990) A new method of stochastic approximation type. Avtomatika i Telemekhanika 7:98–107.Google Scholar
- [45] (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
- [46] (1977) Conditions for geometric ergodicity of countable Markov chains. Doklady Akademii Nauk SSSR 234(2):316–319. Google Scholar
- [47] (2018) On quantitative bounds in the mean martingale central limit theorem. Statist. Probab. Lett. 138:171–176.Crossref, Google Scholar
- [48] (2011) Fundamentals of Stein’s method. Probab. Surveys 8:210–293.Crossref, Google Scholar
- [49] (2013) Applied Probability Models with Optimization Applications (Dover Publications, New York).Google Scholar
- [50] (2023) Online covariance estimation for stochastic gradient descent under Markovian sampling. Preprint, submitted August 3, https://arxiv.org/abs/2308.01481.Google Scholar
- [51] (1988) Efficient estimations from a slowly convergent Robbins-Monro process. Technical report, Cornell University School of Operations Research and Industrial Engineering, Ithaca, NY.Google Scholar
- [52] (1994) Strengthening ergodicity to geometric ergodicity for Markov chains. Stochastic Models 10(1):45–74.Crossref, Google Scholar
- [53] (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Proc. 32nd Conf. Learn. Theory, vol. 99 (PMLR, New York), 2803–2830.Google Scholar
- [54] (1972) A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. Le Cam LM, Neyman J, Scott EL, eds. Proc. 6th Berkeley Sympos. Math. Statist. Probab. (University of California Press, Berkeley, CA), 583–603.Google Scholar
- [55] (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- [56] (2022) Stochastic linear optimization never overfits with quadratically-bounded losses on general data. Proc. 35th Conf. Learn. Theory, vol. 178 (PMLR, New York), 5453–5488.Google Scholar
- [57] (1996) Analysis of temporal-difference learning with function approximation. Jordan MI, Petsche T, eds. NIPS’96: Proc. 10th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1075–1081.Google Scholar
- [58] (2023) Tight finite time bounds of two-time-scale linear stochastic approximation with Markovian noise. Preprint, submitted December 31, https://arxiv.org/abs/2401.00364.Google Scholar
- [59] (2018) A high-dimensional CLT in W2 distance with near optimal convergence rate. Probab. Theory Related Fields 170:821–845.Crossref, Google Scholar

