Analyzing Approximate Value Iteration Algorithms

Arunselvan Ramaswamy
Arunselvan Ramaswamy
[email protected]
https://orcid.org/0000-0001-7547-8111
Department of Computer Science, Paderborn University, 33098 Paderborn, Germany;
Search for more papers by this author
,
Shalabh Bhatnagar
Shalabh Bhatnagar
[email protected]
https://orcid.org/0000-0001-7644-3914
Department of Computer Science and Automation and the Robert Bosch Center for Cyber-Physical Systems, Indian Institute of Science, Bengaluru 560012, India
Search for more papers by this author

Department of Computer Science, Paderborn University, 33098 Paderborn, Germany;

Department of Computer Science and Automation and the Robert Bosch Center for Cyber-Physical Systems, Indian Institute of Science, Bengaluru 560012, India

Search for more papers by this author

Published Online:30 Dec 2021https://doi.org/10.1287/moor.2021.1202

References

[1] Abounadi J, Bertsekas DP, Borkar V (2002) Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms. SIAM J. Control Optim. 41(1):1–22.Crossref, Google Scholar
[2] Aubin J, Cellina A (1984) Differential Inclusions: Set-Valued Maps and Viability Theory (Springer, Berlin).Crossref, Google Scholar
[3] Benaïm M (1996) A dynamical system approach to stochastic approximations. SIAM J. Control Optim. 34(2):437–472.Crossref, Google Scholar
[4] Benaïm M, Hirsch MW (1996) Asymptotic pseudotrajectories and chain recurrent flows, with applications. J. Dynam. Differential Equations 8:141–176.Crossref, Google Scholar
[5] Benaïm M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1):328–348.Crossref, Google Scholar
[6] Bertsekas DP (2013) Abstract Dynamic Programming (Athena Scientific, Belmont, MA).Google Scholar
[7] Bertsekas DP, Tsitsiklis JN (1996) Neuro-Dynamic Programming, 1st ed. (Athena Scientific, Belmont, MA).Google Scholar
[8] Billingsley P (2013) Convergence of Probability Measures (John Wiley & Sons, Hoboken, NY).Google Scholar
[9] Borkar VS (1997) Stochastic approximation with two time scales. Syst. Control Lett. 29(5):291–294.Crossref, Google Scholar
[10] Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[11] Borkar VS, Meyn SP (1999) The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2):447–469.Crossref, Google Scholar
[12] De Farias DP, Van Roy B (2000) On the existence of fixed points for approximate value iteration and temporal-difference learning. J. Optim. Theory Appl. 105(3):589–608.Crossref, Google Scholar
[13] Jianqing F, Wang Z, Xie Y, Yang Z (2020) A theoretical analysis of deep Q-learning. Proc. Second Conf. Learning Dynam. Control. Proc. Machine Learn. Res., vol. 120 (PMLR), 486–489.Google Scholar
[14] Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, et al. (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533.Crossref, Google Scholar
[15] Munos R (2005) Error bounds for approximate value iteration. Cohn A, ed. Proc. 20th Natl. Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 1006–1011.Google Scholar
[16] Munos R, Szepesvári C (2008) Finite-time bounds for fitted value iteration J. Machine Learning Res. 9(27):815−857. Google Scholar
[17] Nadler S (1969) Multi-valued contraction mappings. Pacific J. Math. 30(2):475–488.Crossref, Google Scholar
[18] Ramaswamy A, Bhatnagar B (2017) A generalization of the Borkar-Meyn theorem for stochastic recursive inclusions. Math. Oper. Res. 42(3):648–661.Link, Google Scholar
[19] Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
[20] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, et al. (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 47, Issue 3

August 2022

Pages 1707-2545, C2

Article Information

Metrics

Information

Received:June 21, 2019
Accepted:July 04, 2021
Published Online:December 30, 2021

Cite as

Arunselvan Ramaswamy, Shalabh Bhatnagar (2022) Analyzing Approximate Value Iteration Algorithms. Mathematics of Operations Research 47(3):2138-2159.

https://doi.org/10.1287/moor.2021.1202

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Analyzing Approximate Value Iteration Algorithms

References

Volume 47, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News