Empirical Dynamic Programming

William B. Haskell
William B. Haskell
[email protected]
Department of Industrial and Systems Engineering, National University of Singapore, Singapore 117576
Search for more papers by this author
,
Rahul Jain
Rahul Jain
[email protected]
Electrical Engineering, Computer Science, and Industrial and Systems Engineering Departments, University of Southern California, Los Angeles, California 90089
Search for more papers by this author
,
Dileep Kalathil
Dileep Kalathil
[email protected]
Electrical Engineering and Computer Science Department, University of California, Berkeley, Berkeley, California 94720
Search for more papers by this author

Department of Industrial and Systems Engineering, National University of Singapore, Singapore 117576

Electrical Engineering, Computer Science, and Industrial and Systems Engineering Departments, University of Southern California, Los Angeles, California 90089

Search for more papers by this author

Dileep Kalathil

[email protected]

Electrical Engineering and Computer Science Department, University of California, Berkeley, Berkeley, California 94720

Search for more papers by this author

Published Online:5 Jan 2016https://doi.org/10.1287/moor.2015.0733

References

Abounadi J, Bertsekas D, Borkar VS (2001) Learning algorithms for Markov decision processes with average cost. SIAM J. Control Optim. 40(3):681–698.Crossref, Google Scholar
Almudevar A (2008) Approximate fixed point iteration with an application to infinite horizon Markov decision processes. SIAM J. Control Optim. 47(5):2303–2347.Crossref, Google Scholar
Anthony M, Bartlett P (2009) Neural Network Learning: Theoretical Foundations (Cambridge University Press, Cambridge, UK).Google Scholar
Barto A, Sutton S, Brouwer P (1981) Associative search network: A reinforcement learning associative memory. Biol. Cybernet. 40(3):201–211.Crossref, Google Scholar
Bellman R (1957) Dynamic Programming (Princeton University Press, Princeton, NJ).Google Scholar
Bellman R, Dreyfus S (1959) Functional approximations and dynamic programming. Math. Tables and Other Aids Comput. 13(68):247–251.Crossref, Google Scholar
Bertsekas D (2011) Approximate policy iteration: A survey and some new methods. J. Control Theory Appl. 9(3):310–335.Crossref, Google Scholar
Bertsekas D (2012) Dynamic Programming and Optimal Control, Vols. I and II, 4th ed. (Athena Scientific, Nashua, NH).Google Scholar
Bertsekas D, Tsitsiklis J (1996) Neuro-Dynamic Programming (Athena Scientific, Nashua, NH).Google Scholar
Borkar VS (2002) Q-learning for risk-sensitive control. Math. Oper. Res. 27(2):294–311.Link, Google Scholar
Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Borkar VS, Meyn SP (2000) The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim. 38(2):447–469.Crossref, Google Scholar
Chang HS, Fu M, Hu J, Marcus SI (2006) Simulation-Based Algorithms for Markov Decision Processes (Springer, Berlin).Google Scholar
Chang HS, Fu M, Hu J, Marcus SI (2007) A survey of some simulation-based algorithms for Markov decision processes. Comm. Inf. Syst. 7(1):59–92.Google Scholar
Chueshov I (2002) Monotone Random Systems Theory and Applications, Vol. 1779 (Springer, Berlin).Crossref, Google Scholar
Cooper W, Rangarajan B (2012) Performance guarantees for empirical Markov decision processes with applications to multiperiod inventory models. Oper. Res. 60(5):1267–1281.Link, Google Scholar
Cooper W, Henderson S, Lewis M (2003) Convergence of simulation-based policy iteration. Probab. Engrg. Inform. Sci. 17(2):213–234.Crossref, Google Scholar
Even-Dar E, Mansour Y (2004) Learning rates for q-learning. J. Mach. Learn. Res 5:1–25.Google Scholar
Howard R (1971) Dynamic Probabilistic Systems: Vol.: 2.: Semi-Markov and Decision Processes (John Wiley & Sons, New York).Google Scholar
Jain R, Varaiya P (2006) Simulation-based uniform value function estimates of Markov decision processes. SIAM J. Control Optim. 45(5):1633–1656.Crossref, Google Scholar
Jain R, Varaiya P (2010) Simulation-based optimization of Markov decision processes: An empirical process theory approach. Automatica 46(8):1297–1304.Crossref, Google Scholar
Kakade SM (2003) On the sample complexity of reinforcement learning. Ph.D. thesis, University of London, London.Google Scholar
Kiefer J, Wolfowitz J (1952) Stochastic estimation of the maximum of a regression function. Ann. Math. Statist. 23(3):462–466.Crossref, Google Scholar
Konda VR, Borkar VS (1999) Actor-critic-type learning algorithms for Markov decision processes. SIAM J. Control Optim. 38(1):94–123.Crossref, Google Scholar
Konda VR, Tsitsiklis J (2004) Convergence rate of linear two-time-scale stochastic approximation. Ann. Appl. Probab. 14(2):796–819.Crossref, Google Scholar
Kushner HJ, Clark DS (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems (Springer, New York).Crossref, Google Scholar
Levin D, Peres Y, Wilmer EL (2009) Markov Chains and Mixing Times (AMS, Providence, RI).Google Scholar
Ljung L (1977) Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Control 22(4):551–575.Crossref, Google Scholar
Minsky M (1961) Steps toward artificial intelligence. Proc. IRE 49(1):8–30.Crossref, Google Scholar
Müller A, Stoyan D (2002) Comparison Methods for Stochastic Models and Risks, Vol. 389 (Wiley, Hoboken, NJ).Google Scholar
Munos R, Szepesvari C (2008) Finite time bounds for fitted value iteration. J. Machine Learn. Res. 1:815–857.Google Scholar
Narendra KS, Thathachar MAL (2012) Learning Automata: An Introduction (Dover Publications, Mineola, NY).Google Scholar
Nemhauser GL (1966) Introduction to Dynamic Programming (John Wiley & Sons, New York).Google Scholar
Papadimitriou CH, Tsitsiklis J (1987) The complexity of Markov decision processes. Math. Oper. Res. 12(3):441–450.Link, Google Scholar
Powell WB (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality, 2nd ed. (Wiley, Hoboken, NJ).Crossref, Google Scholar
Rajaraman K, Sastry PS (1996) Finite time analysis of the pursuit algorithm for learning automata. IEEE Trans. Systems, Man, and Cybernetics, Part B: Cybernetics 26(4):590–598.Crossref, Google Scholar
Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
Ross SM (1996) Stochastic Processes (John Wiley & Sons, New York).Google Scholar
Rust J (1997) Using randomization to break the curse of dimensionality. Econometrica 65(3):487–516.Crossref, Google Scholar
Shaked M, Shanthikumar JG (2007) Stochastic Orders (Springer, New York).Crossref, Google Scholar
Shapley LS (1953) Stochastic games. Proc. Nat. Acad. Sci. 39(10):1095–1100.Crossref, Google Scholar
Shiryaev A (1995) Probability (Springer, New York).Google Scholar
Sutton RS, Barto AG (1998) Reinforcement Learning: An Introduction (Cambridge University Press, Cambridge, UK).Google Scholar
Szepesvari C (2010) Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning (Morgan and Claypool Publishers, San Rafael, CA).Google Scholar
Thathachar MAL, Sastry PS (1985) A class of rapidly convergent algorithms for learning automata. IEEE Trans. Systems, Man, and Cybernetics SMC-15(1):168–175.Crossref, Google Scholar
Tsitsiklis J (2003) On the convergence of optimistic policy iteration. J. Mach. Learn. Res. 3:59–72.Google Scholar
Watkins C, Dayan P (1992) Q-learning. Machine Learn. 8(3–4):279–292.Crossref, Google Scholar
Werbos P (1974) Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University, Cambridge, MA.Google Scholar
Whitt W (1978) Approximations of dynamic programs, I. Math. Oper. Res 3(3):231–243.Link, Google Scholar
Whitt W (1979) Approximations of dynamic programs, II. Math. Oper. Res 4(2):179–185.Link, Google Scholar

cover image Mathematics of Operations Research

Volume 41, Issue 2

May 2016

Pages 377-744

Article Information

Metrics

Information

Received:November 21, 2013
Published Online:January 05, 2016

Cite as

William B. Haskell, Rahul Jain, Dileep Kalathil (2016) Empirical Dynamic Programming. Mathematics of Operations Research 41(2):402-429.

https://doi.org/10.1287/moor.2015.0733

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Empirical Dynamic Programming

References

Volume 41, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News