On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Huizhen Yu
Huizhen Yu
[email protected]
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Dimitri P. Bertsekas
Dimitri P. Bertsekas
[email protected]
Laboratory for Information and Decision Systems and Department of EECS, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Laboratory for Information and Decision Systems and Department of EECS, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Published Online:28 Nov 2012https://doi.org/10.1287/moor.1120.0562

Abstract

We consider a totally asynchronous stochastic approximation algorithm, Q-learning, for solving finite space stochastic shortest path (SSP) problems, which are undiscounted, total cost Markov decision processes with an absorbing and cost-free state. For the most commonly used SSP models, existing convergence proofs assume that the sequence of Q-learning iterates is bounded with probability one, or some other condition that guarantees boundedness. We prove that the sequence of iterates is naturally bounded with probability one, thus furnishing the boundedness condition in the convergence proof by Tsitsiklis [Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Machine Learn. 16:185–202] and establishing completely the convergence of Q-learning for these SSP models.

cover image Mathematics of Operations Research

Volume 38, Issue 2

May 2013

Pages 209-392

Article Information

Metrics

Information

Received:June 06, 2011
Published Online:November 28, 2012

Cite as

Huizhen Yu, Dimitri P. Bertsekas, (2012) On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems. Mathematics of Operations Research 38(2):209-227.

https://doi.org/10.1287/moor.1120.0562

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Abstract

Volume 38, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News