Analyzing Approximate Value Iteration Algorithms
Published Online:30 Dec 2021https://doi.org/10.1287/moor.2021.1202
References
- [1] (2002) Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms. SIAM J. Control Optim. 41(1):1–22.Crossref, Google Scholar
- [2] (1984) Differential Inclusions: Set-Valued Maps and Viability Theory (Springer, Berlin).Crossref, Google Scholar
- [3] (1996) A dynamical system approach to stochastic approximations. SIAM J. Control Optim. 34(2):437–472.Crossref, Google Scholar
- [4] (1996) Asymptotic pseudotrajectories and chain recurrent flows, with applications. J. Dynam. Differential Equations 8:141–176.Crossref, Google Scholar
- [5] (2005) Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1):328–348.Crossref, Google Scholar
- [6] (2013) Abstract Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
- [7] (1996) Neuro-Dynamic Programming, 1st ed. (Athena Scientific, Belmont, MA).Google Scholar
- [8] (2013) Convergence of Probability Measures (John Wiley & Sons, Hoboken, NY).Google Scholar
- [9] (1997) Stochastic approximation with two time scales. Syst. Control Lett. 29(5):291–294.Crossref, Google Scholar
- [10] (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [11] (1999) The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2):447–469.Crossref, Google Scholar
- [12] (2000) On the existence of fixed points for approximate value iteration and temporal-difference learning. J. Optim. Theory Appl. 105(3):589–608.Crossref, Google Scholar
- [13] (2020) A theoretical analysis of deep Q-learning. Proc. Second Conf. Learning Dynam. Control. Proc. Machine Learn. Res., vol. 120 (PMLR), 486–489.Google Scholar
- [14] (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Crossref, Google Scholar
- [15] (2005) Error bounds for approximate value iteration. Cohn A, ed. Proc. 20th Natl. Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1006–1011.Google Scholar
- [16] (2008) Finite-time bounds for fitted value iteration J. Machine Learning Res. 9(27):815−857. Google Scholar
- [17] (1969) Multi-valued contraction mappings. Pacific J. Math. 30(2):475–488.Crossref, Google Scholar
- [18] (2017) A generalization of the Borkar-Meyn theorem for stochastic recursive inclusions. Math. Oper. Res. 42(3):648–661.Link, Google Scholar
- [19] (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
- [20] (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359.Crossref, Google Scholar

