Empirical Dynamic Programming

Published Online:https://doi.org/10.1287/moor.2015.0733

References

  • Abounadi J, Bertsekas D, Borkar VS (2001) Learning algorithms for Markov decision processes with average cost. SIAM J. Control Optim. 40(3):681–698.CrossrefGoogle Scholar
  • Almudevar A (2008) Approximate fixed point iteration with an application to infinite horizon Markov decision processes. SIAM J. Control Optim. 47(5):2303–2347.CrossrefGoogle Scholar
  • Anthony M, Bartlett P (2009) Neural Network Learning: Theoretical Foundations (Cambridge University Press, Cambridge, UK).Google Scholar
  • Barto A, Sutton S, Brouwer P (1981) Associative search network: A reinforcement learning associative memory. Biol. Cybernet. 40(3):201–211.CrossrefGoogle Scholar
  • Bellman R (1957) Dynamic Programming (Princeton University Press, Princeton, NJ).Google Scholar
  • Bellman R, Dreyfus S (1959) Functional approximations and dynamic programming. Math. Tables and Other Aids Comput. 13(68):247–251.CrossrefGoogle Scholar
  • Bertsekas D (2011) Approximate policy iteration: A survey and some new methods. J. Control Theory Appl. 9(3):310–335.CrossrefGoogle Scholar
  • Bertsekas D (2012) Dynamic Programming and Optimal Control, Vols. I and II, 4th ed. (Athena Scientific, Nashua, NH).Google Scholar
  • Bertsekas D, Tsitsiklis J (1996) Neuro-Dynamic Programming (Athena Scientific, Nashua, NH).Google Scholar
  • Borkar VS (2002) Q-learning for risk-sensitive control. Math. Oper. Res. 27(2):294–311.LinkGoogle Scholar
  • Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Borkar VS, Meyn SP (2000) The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2):447–469.CrossrefGoogle Scholar
  • Chang HS, Fu M, Hu J, Marcus SI (2006) Simulation-Based Algorithms for Markov Decision Processes (Springer, Berlin).Google Scholar
  • Chang HS, Fu M, Hu J, Marcus SI (2007) A survey of some simulation-based algorithms for Markov decision processes. Comm. Inf. Syst. 7(1):59–92.Google Scholar
  • Chueshov I (2002) Monotone Random Systems Theory and Applications, Vol. 1779 (Springer, Berlin).CrossrefGoogle Scholar
  • Cooper W, Rangarajan B (2012) Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models. Oper. Res. 60(5):1267–1281.LinkGoogle Scholar
  • Cooper W, Henderson S, Lewis M (2003) Convergence of simulation-based policy iteration. Probab. Engrg. Inform. Sci. 17(2):213–234.CrossrefGoogle Scholar
  • Even-Dar E, Mansour Y (2004) Learning rates for q-learning. J. Mach. Learn. Res 5:1–25.Google Scholar
  • Howard R (1971) Dynamic Probabilistic Systems: Vol.: 2.: Semi-Markov and Decision Processes (John Wiley & Sons, New York).Google Scholar
  • Jain R, Varaiya P (2006) Simulation-based uniform value function estimates of Markov decision processes. SIAM J. Control Optim. 45(5):1633–1656.CrossrefGoogle Scholar
  • Jain R, Varaiya P (2010) Simulation-based optimization of Markov decision processes: An empirical process theory approach. Automatica 46(8):1297–1304.CrossrefGoogle Scholar
  • Kakade SM (2003) On the sample complexity of reinforcement learning. Ph.D. thesis, University of London, London.Google Scholar
  • Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3):462–466.CrossrefGoogle Scholar
  • Konda VR, Borkar VS (1999) Actor-critic-type learning algorithms for Markov decision processes. SIAM J. Control Optim. 38(1):94–123.CrossrefGoogle Scholar
  • Konda VR, Tsitsiklis J (2004) Convergence rate of linear two-time-scale stochastic approximation. Ann. Appl. Probab. 14(2):796–819.CrossrefGoogle Scholar
  • Kushner HJ, Clark DS (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems (Springer, New York).CrossrefGoogle Scholar
  • Levin D, Peres Y, Wilmer EL (2009) Markov Chains and Mixing Times (AMS, Providence, RI).Google Scholar
  • Ljung L (1977) Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Control 22(4):551–575.CrossrefGoogle Scholar
  • Minsky M (1961) Steps toward artificial intelligence. Proc. IRE 49(1):8–30.CrossrefGoogle Scholar
  • Müller A, Stoyan D (2002) Comparison Methods for Stochastic Models and Risks, Vol. 389 (Wiley, Hoboken, NJ).Google Scholar
  • Munos R, Szepesvari C (2008) Finite time bounds for fitted value iteration. J. Machine Learn. Res. 1:815–857.Google Scholar
  • Narendra KS, Thathachar MAL (2012) Learning Automata: An Introduction (Dover Publications, Mineola, NY).Google Scholar
  • Nemhauser GL (1966) Introduction to Dynamic Programming (John Wiley & Sons, New York).Google Scholar
  • Papadimitriou CH, Tsitsiklis J (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.LinkGoogle Scholar
  • Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (Wiley, Hoboken, NJ).CrossrefGoogle Scholar
  • Rajaraman K, Sastry PS (1996) Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics 26(4):590–598.CrossrefGoogle Scholar
  • Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.CrossrefGoogle Scholar
  • Ross SM (1996) Stochastic Processes (John Wiley & Sons, New York).Google Scholar
  • Rust J (1997) Using randomization to break the curse of dimensionality. Econometrica 65(3):487–516.CrossrefGoogle Scholar
  • Shaked M, Shanthikumar JG (2007) Stochastic Orders (Springer, New York).CrossrefGoogle Scholar
  • Shapley LS (1953) Stochastic games. Proc. Nat. Acad. Sci. 39(10):1095–1100.CrossrefGoogle Scholar
  • Shiryaev A (1995) Probability (Springer, New York).Google Scholar
  • Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (Cambridge University Press, Cambridge, UK).Google Scholar
  • Szepesvari C (2010) Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning (Morgan and Claypool Publishers, San Rafael, CA).Google Scholar
  • Thathachar MAL, Sastry PS (1985) A class of rapidly convergent algorithms for learning automata. IEEE Trans. Systems, Man, and Cybernetics SMC-15(1):168–175.CrossrefGoogle Scholar
  • Tsitsiklis J (2003) On the convergence of optimistic policy iteration. J. Mach. Learn. Res. 3:59–72.Google Scholar
  • Watkins C, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.CrossrefGoogle Scholar
  • Werbos P (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA.Google Scholar
  • Whitt W (1978) Approximations of dynamic programs, I. Math. Oper. Res 3(3):231–243.LinkGoogle Scholar
  • Whitt W (1979) Approximations of dynamic programs, II. Math. Oper. Res 4(2):179–185.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.