On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

Huizhen Yu
Huizhen Yu
[email protected]
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Dimitri P. Bertsekas
Dimitri P. Bertsekas
[email protected]
Laboratory for Information and Decision Systems and Department of EECS, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Laboratory for Information and Decision Systems and Department of EECS, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Published Online:28 Nov 2012https://doi.org/10.1287/moor.1120.0562

References

Abounadi J, Bertsekas DP, Borkar V. Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms. SIAM J. Control Optim. (2002) 41(1):1–22Crossref, Google Scholar
Bertsekas DP, Tsitsiklis JN. An analysis of stochastic shortest path problems. Math. Oper. Res. (1991) 16(3):580–595Link, Google Scholar
Bertsekas DP, Tsitsiklis JN. Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
Borkar VS. Stochastic Approximation: A Dynamic Viewpoint (2008) (Hindustan Book Agency, New Delhi) Google Scholar
Kushner HJ, Yin GG. Stochastic Approximation and Recursive Algorithms and Applications (2003) 2nd ed.(Springer-Verlag, New York) Google Scholar
Puterman ML. Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994) (John Wiley & Sons, New York) Crossref, Google Scholar
Seneta E. Nonnegative Matrices and Markov Chains (1981) 2nd ed.(Springer-Verlag, New York) Crossref, Google Scholar
Tsitsiklis JN. Asynchronous stochastic approximation and Q-learning. Machine Learn. (1994) 16(3):185–202Crossref, Google Scholar
Watkins CJCH. Learning from delayed rewards. (1989) . Ph.D. thesis, Cambridge University, EnglandGoogle Scholar
Yu H. Some proof details for asynchronous stochastic approximation algorithms. (2011) . On-line at: http://www.mit.edu/∼janey_yu/note_asaproofs.pdfGoogle Scholar
Yu H. Stochastic shortest path games and Q-learning. (2011) . LIDS Technical Report 2875, MITGoogle Scholar
Yu H, Bertsekas DP. Q-learning and policy iteration algorithms for stochastic shortest path problems. Ann. Oper. Res. (2012) . Forthcoming DOI: 10.1007/s10479-012-1128-zGoogle Scholar

cover image Mathematics of Operations Research

Volume 38, Issue 2

May 2013

Pages 209-392

Article Information

Metrics

Information

Received:June 06, 2011
Published Online:November 28, 2012

Cite as

Huizhen Yu, Dimitri P. Bertsekas, (2012) On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems. Mathematics of Operations Research 38(2):209-227.

https://doi.org/10.1287/moor.1120.0562

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems

References

Volume 38, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News