Empirical Dynamic Programming
Published Online:5 Jan 2016https://doi.org/10.1287/moor.2015.0733
References
- (2001) Learning algorithms for Markov decision processes with average cost. SIAM J. Control Optim. 40(3):681–698.Crossref, Google Scholar
- (2008) Approximate fixed point iteration with an application to infinite horizon Markov decision processes. SIAM J. Control Optim. 47(5):2303–2347.Crossref, Google Scholar
- (2009) Neural Network Learning: Theoretical Foundations (Cambridge University Press, Cambridge, UK).Google Scholar
- (1981) Associative search network: A reinforcement learning associative memory. Biol. Cybernet. 40(3):201–211.Crossref, Google Scholar
- (1957) Dynamic Programming (Princeton University Press, Princeton, NJ).Google Scholar
- (1959) Functional approximations and dynamic programming. Math. Tables and Other Aids Comput. 13(68):247–251.Crossref, Google Scholar
- (2011) Approximate policy iteration: A survey and some new methods. J. Control Theory Appl. 9(3):310–335.Crossref, Google Scholar
- (2012) Dynamic Programming and Optimal Control, Vols. I and II, 4th ed. (Athena Scientific, Nashua, NH).Google Scholar
- (1996) Neuro-Dynamic Programming (Athena Scientific, Nashua, NH).Google Scholar
- (2002) Q-learning for risk-sensitive control. Math. Oper. Res. 27(2):294–311.Link, Google Scholar
- (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2000) The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2):447–469.Crossref, Google Scholar
- (2006) Simulation-Based Algorithms for Markov Decision Processes (Springer, Berlin).Google Scholar
- (2007) A survey of some simulation-based algorithms for Markov decision processes. Comm. Inf. Syst. 7(1):59–92.Google Scholar
- (2002) Monotone Random Systems Theory and Applications, Vol. 1779 (Springer, Berlin).Crossref, Google Scholar
- (2012) Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models. Oper. Res. 60(5):1267–1281.Link, Google Scholar
- (2003) Convergence of simulation-based policy iteration. Probab. Engrg. Inform. Sci. 17(2):213–234.Crossref, Google Scholar
- (2004) Learning rates for q-learning. J. Mach. Learn. Res 5:1–25.Google Scholar
- (1971) Dynamic Probabilistic Systems: Vol.: 2.: Semi-Markov and Decision Processes (John Wiley & Sons, New York).Google Scholar
- (2006) Simulation-based uniform value function estimates of Markov decision processes. SIAM J. Control Optim. 45(5):1633–1656.Crossref, Google Scholar
- (2010) Simulation-based optimization of Markov decision processes: An empirical process theory approach. Automatica 46(8):1297–1304.Crossref, Google Scholar
- (2003) On the sample complexity of reinforcement learning. Ph.D. thesis, University of London, London.Google Scholar
- (1952) Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3):462–466.Crossref, Google Scholar
- (1999) Actor-critic-type learning algorithms for Markov decision processes. SIAM J. Control Optim. 38(1):94–123.Crossref, Google Scholar
- (2004) Convergence rate of linear two-time-scale stochastic approximation. Ann. Appl. Probab. 14(2):796–819.Crossref, Google Scholar
- (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems (Springer, New York).Crossref, Google Scholar
- (2009) Markov Chains and Mixing Times (AMS, Providence, RI).Google Scholar
- (1977) Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Control 22(4):551–575.Crossref, Google Scholar
- (1961) Steps toward artificial intelligence. Proc. IRE 49(1):8–30.Crossref, Google Scholar
- (2002) Comparison Methods for Stochastic Models and Risks, Vol. 389 (Wiley, Hoboken, NJ).Google Scholar
- (2008) Finite time bounds for fitted value iteration. J. Machine Learn. Res. 1:815–857.Google Scholar
- (2012) Learning Automata: An Introduction (Dover Publications, Mineola, NY).Google Scholar
- (1966) Introduction to Dynamic Programming (John Wiley & Sons, New York).Google Scholar
- (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.Link, Google Scholar
- (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (Wiley, Hoboken, NJ).Crossref, Google Scholar
- (1996) Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics 26(4):590–598.Crossref, Google Scholar
- (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
- (1996) Stochastic Processes (John Wiley & Sons, New York).Google Scholar
- (1997) Using randomization to break the curse of dimensionality. Econometrica 65(3):487–516.Crossref, Google Scholar
- (2007) Stochastic Orders (Springer, New York).Crossref, Google Scholar
- (1953) Stochastic games. Proc. Nat. Acad. Sci. 39(10):1095–1100.Crossref, Google Scholar
- (1995) Probability (Springer, New York).Google Scholar
- (1998) Reinforcement Learning: An Introduction (Cambridge University Press, Cambridge, UK).Google Scholar
- (2010) Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning (Morgan and Claypool Publishers, San Rafael, CA).Google Scholar
- (1985) A class of rapidly convergent algorithms for learning automata. IEEE Trans. Systems, Man, and Cybernetics SMC-15(1):168–175.Crossref, Google Scholar
- (2003) On the convergence of optimistic policy iteration. J. Mach. Learn. Res. 3:59–72.Google Scholar
- (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar
- (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA.Google Scholar
- (1978) Approximations of dynamic programs, I. Math. Oper. Res 3(3):231–243.Link, Google Scholar
- (1979) Approximations of dynamic programs, II. Math. Oper. Res 4(2):179–185.Link, Google Scholar

