Open Access

A Concentration Bound for TD(0) with Function Approximation

Siddharth Chandak
Corresponding Author
Siddharth Chandak
[email protected]
https://orcid.org/0000-0003-3237-7729
Department of Electrical Engineering, Stanford University, Stanford, California 94305
Search for more papers by this author
,
Vivek S. Borkar
Vivek S. Borkar
[email protected]
https://orcid.org/0000-0003-0756-5402
Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra 400076, India
Search for more papers by this author

Siddharth Chandak

Corresponding Author

Siddharth Chandak

[email protected]

https://orcid.org/0000-0003-3237-7729

Department of Electrical Engineering, Stanford University, Stanford, California 94305

Search for more papers by this author

Vivek S. Borkar

[email protected]

https://orcid.org/0000-0003-0756-5402

Department of Electrical Engineering, Indian Institute of Technology Bombay, Mumbai, Maharashtra 400076, India

Search for more papers by this author

Published Online:26 Dec 2025https://doi.org/10.1287/stsy.2023.0055

References

Azar MG, Osband I, Munos R (2017) Minimax regret bounds for reinforcement learning. Precup D, Teh YW, eds. Proc. 34th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 70 (PMLR, New York), 263–272.Google Scholar
Bhandari J, Russo D, Singal R (2018) A finite time analysis of temporal difference learning with linear function approximation. Proc. 31st Conf. Learn. Theory (PMLR, New York), 1691–1692.Google Scholar
Borkar VS (1991) Topics in Controlled Markov Chains, Pitman Research Notes in Mathematics Series, vol. 240 (Longman Scientific & Technical, Harlow, Essex, UK).Google Scholar
Borkar VS (2002) On the lock-in probability of stochastic approximation. Combin. Probab. Comput. 11(1):11–20.Google Scholar
Borkar VS (2022) Corrigendum to “A concentration bound for contractive stochastic approximation” [Syst. Control Lett. 153 (2021) 104947]. Systems Control Lett. 153:104947.Google Scholar
Chandak S, Borkar VS, Dodhia P (2022) Concentration of contractive stochastic approximation and reinforcement learning. Stochastic Systems 12(4):411–430.Link, Google Scholar
Chandak S, Borkar VS, Dolhare H (2023) A concentration bound for LSPE(λ). Systems Control Lett. 171:105418.Google Scholar
Chen Z, Maguluri ST, Zubeldia M (2025) Concentration of contractive stochastic approximation: Additive and multiplicative noise. Ann. Appl. Probab. 35(2):1298–1352.Google Scholar
Chen Z, Maguluri ST, Shakkottai S, Shanmugam K (2020) Finite-sample analysis of contractive stochastic approximation using smooth convex envelopes. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 8223–8234.Google Scholar
Chen Z, Maguluri ST, Shakkottai S, Shanmugam K (2021) A Lyapunov theory for finite-sample guarantees of asynchronous Q-learning and TD-learning variants. Preprint, submitted February 2, https://arxiv.org/abs/2102.01567.Google Scholar
Chung F, Lu L (2006) Concentration inequalities and martingale inequalities: A survey. Internet Math. 3(1):79–127.Google Scholar
Dalal G, Szörényi B, Thoppe G, Mannor S (2018) Finite sample analyses for TD(0) with function approximation. Proc. AAAI Conf. Artificial Intelligence, vol. 32 (AAAI Press, Palo Alto, CA).Google Scholar
Even-Dar E, Mansour Y (2003) Learning rates for Q-learning. J. Machine Learn. Res. 5:1–25.Google Scholar
Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is Q-learning provably efficient? Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, eds. NIPS’18: Proc. 32nd Internat. Conf. Neural Inform. Processing Systems, vol. 31 (Curran Associates, Inc., Red Hook, NY), 4868–4878.Google Scholar
Kamal S (2010) On the convergence, lock-in probability, and sample complexity of stochastic approximation. SIAM J. Control Optim. 48(8):5178–5192.Google Scholar
Li G, Cai C, Chen Y, Wei Y, Chi Y (2023) Is Q-learning minimax optimal? A tight sample complexity analysis. Oper. Res. 72(1):222–236.Google Scholar
Liu Q, Watbled F (2009) Exponential inequalities for martingales and asymptotic properties of the free energy of directed polymers in a random environment. Stochastic Processes Their Appl. 119(10):3101–3132.Google Scholar
Meerkov SM (1972) Simplified description of slow Markov walks. Part II. Automation Remote Control 33(5):761.Google Scholar
Patil G, Prashanth LA, Nagaraj D, Precup D (2023) Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation. Proc. 26th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 206 (PMLR, New York), 5438–5448.Google Scholar
Prashanth LA, Korda N, Munos R (2021) Concentration bounds for temporal difference learning with linear function approximation: The case of batch data and uniform sampling. Machine Learn. 110(3):559–618.Google Scholar
Qu G, Wierman A (2020) Finite-time analysis of asynchronous stochastic approximation and Q-learning. Proc. 33rd Conf. Learn. Theory (PMLR, New York), 3185–3205.Google Scholar
Srikant R, Ying L (2019) Finite-time error bounds for linear stochastic approximation and TD learning. Proc. 32nd Conf. Learn. Theory (PMLR, New York), 2803–2830.Google Scholar
Tao T, Vu V (2015) Random matrices: Universality of local spectral statistics of non-Hermitian matrices. Ann. Probab. 43(2):782 –874.Google Scholar
Thoppe G, Borkar V (2019) A concentration bound for stochastic approximation via Alekseev’s formula. Stochastic Systems 9(1):1–26.Link, Google Scholar
Tsitsiklis J, Van Roy B (1997) An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control 42(5):674–690.Google Scholar
Yang L, Wang M (2019) Sample-optimal parametric Q-learning using linearly additive features. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 6995–7004.Google Scholar
Yang Z, Jin C, Wang Z, Wang M, Jordan M (2020) On function approximation in reinforcement learning: Optimism in the face of large state spaces. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. NIPS’20: Proc. 34th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 13903–13916.Google Scholar

Volume 16, Issue 1

March 2026

Pages 1-107

Article Information

Metrics

Information

Received:December 16, 2023
Accepted:October 24, 2025
Published Online:December 26, 2025

Cite as

Siddharth Chandak, Vivek S. Borkar (2025) A Concentration Bound for TD(0) with Function Approximation. Stochastic Systems 16(1):44-60.

https://doi.org/10.1287/stsy.2023.0055

Keywords

PDF download

Available Issues

Available Issues

Available Issues

A Concentration Bound for TD(0) with Function Approximation

References

Volume 16, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News