A Concentration Bound for TD(0) with Function Approximation
Published Online:26 Dec 2025https://doi.org/10.1287/stsy.2023.0055
References
- (2017) Minimax regret bounds for reinforcement learning. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 70 (PMLR, New York), 263–272.Google Scholar
- (2018) A finite time analysis of temporal difference learning with linear function approximation. Proc. 31st Conf. Learn. Theory (PMLR, New York), 1691–1692.Google Scholar
- (1991) Topics in Controlled Markov Chains, Pitman Research Notes in Mathematics Series, vol. 240 (Longman Scientific & Technical, Harlow, Essex, UK).Google Scholar
- (2002) On the lock-in probability of stochastic approximation. Combin. Probab. Comput. 11(1):11–20.Google Scholar
- (2022) Corrigendum to “A concentration bound for contractive stochastic approximation” [Syst. Control Lett. 153 (2021) 104947]. Systems Control Lett. 153:104947.Google Scholar
- (2022) Concentration of contractive stochastic approximation and reinforcement learning. Stochastic Systems 12(4):411–430.Link, Google Scholar
- (2023) A concentration bound for LSPE(λ). Systems Control Lett. 171:105418.Google Scholar
- (2025) Concentration of contractive stochastic approximation: Additive and multiplicative noise. Ann. Appl. Probab. 35(2):1298–1352.Google Scholar
- (2020) Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 8223–8234.Google Scholar
- (2021) A Lyapunov theory for finite-sample guarantees of asynchronous Q-learning and TD-learning variants. Preprint, submitted February 2, https://arxiv.org/abs/2102.01567.Google Scholar
- (2006) Concentration inequalities and martingale inequalities: A survey. Internet Math. 3(1):79–127.Google Scholar
- (2018) Finite sample analyses for TD(0) with function approximation. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA).Google Scholar
- (2003) Learning rates for Q-learning. J. Machine Learn. Res. 5:1–25.Google Scholar
- (2018) Is Q-learning provably efficient? Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 4868–4878.Google Scholar
- (2010) On the convergence, lock-in probability, and sample complexity of stochastic approximation. SIAM J. Control Optim. 48(8):5178–5192.Google Scholar
- (2023) Is Q-learning minimax optimal? A tight sample complexity analysis. Oper. Res. 72(1):222–236.Google Scholar
- (2009) Exponential inequalities for martingales and asymptotic properties of the free energy of directed polymers in a random environment. Stochastic Processes Their Appl. 119(10):3101–3132.Google Scholar
- (1972) Simplified description of slow Markov walks. Part II. Automation Remote Control 33(5):761.Google Scholar
- (2023) Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation. Proc. 26th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 206 (PMLR, New York), 5438–5448.Google Scholar
- (2021) Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling. Machine Learn. 110(3):559–618.Google Scholar
- (2020) Finite-time analysis of asynchronous stochastic approximation and Q-learning. Proc. 33rd Conf. Learn. Theory (PMLR, New York), 3185–3205.Google Scholar
- (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Proc. 32nd Conf. Learn. Theory (PMLR, New York), 2803–2830.Google Scholar
- (2015) Random matrices: Universality of local spectral statistics of non-Hermitian matrices. Ann. Probab. 43(2):782 –874.Google Scholar
- (2019) A concentration bound for stochastic approximation via Alekseev’s formula. Stochastic Systems 9(1):1–26.Link, Google Scholar
- (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control 42(5):674–690.Google Scholar
- (2019) Sample-optimal parametric Q-learning using linearly additive features. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 6995–7004.Google Scholar
- (2020) On function approximation in reinforcement learning: Optimism in the face of large state spaces. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 13903–13916.Google Scholar

