On Boundedness of Q-Learning Iterates for Stochastic Shortest Path Problems
Published Online:28 Nov 2012https://doi.org/10.1287/moor.1120.0562
References
- . Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms. SIAM J. Control Optim. (2002) 41(1):1–22Crossref, Google Scholar
- . An analysis of stochastic shortest path problems. Math. Oper. Res. (1991) 16(3):580–595Link, Google Scholar
- . Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
- . Stochastic Approximation: A Dynamic Viewpoint (2008) (Hindustan Book Agency, New Delhi) Google Scholar
- . Stochastic Approximation and Recursive Algorithms and Applications (2003) 2nd ed.(Springer-Verlag, New York) Google Scholar
- . Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994) (John Wiley & Sons, New York) Crossref, Google Scholar
- . Nonnegative Matrices and Markov Chains (1981) 2nd ed.(Springer-Verlag, New York) Crossref, Google Scholar
- . Asynchronous stochastic approximation and Q-learning. Machine Learn. (1994) 16(3):185–202Crossref, Google Scholar
- . Learning from delayed rewards. (1989) . Ph.D. thesis, Cambridge University, EnglandGoogle Scholar
- . Some proof details for asynchronous stochastic approximation algorithms. (2011) . On-line at: http://www.mit.edu/∼janey_yu/note_asaproofs.pdfGoogle Scholar
- . Stochastic shortest path games and Q-learning. (2011) . LIDS Technical Report 2875, MITGoogle Scholar
- . Q-learning and policy iteration algorithms for stochastic shortest path problems. Ann. Oper. Res. (2012) . Forthcoming DOI: 10.1007/s10479-012-1128-zGoogle Scholar

