Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm
Published Online:25 Feb 2008https://doi.org/10.1287/ijoc.1070.0240
References
- A stochastic approximation algorithm with varying bounds. Oper. Res. (1995) 43:1037–1048Link, Google Scholar
- Learning to act using real-time dynamic programming. Artificial Intelligence (1995) 72:81–138Crossref, Google Scholar
- Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
- The linear programming approach to approximate dynamic programming. Oper. Res. (2003) 51:850–865Link, Google Scholar
- Optimal control of batch service queues. Adv. Appl. Probab. (1973) 5:340–361Crossref, Google Scholar
- Estimation and optimization in discrete inventory models. (2002) . Ph.D. thesis, The University of British Columbia, VancouverGoogle Scholar
- Operating characteristics of a simple shuttle under local dispatching rules. Oper. Res. (1972) 20:1077–1088Link, Google Scholar
- Operating characteristics of an infinite capacity shuttle: Control at a single terminal. Oper. Res. (1974) 22:1008–1024Link, Google Scholar
- Stochastic Theory of Service Systems (1973) (Pergamon Press, New York) Google Scholar
- Stochastic Approximation Methods for Constrained and Unconstrained Systems (1978) (Springer-Verlag, Berlin) Crossref, Google Scholar
- Analysis of recursive stochastic algorithms. IEEE Trans. Automatic Control (1977) 22:551–575Crossref, Google Scholar
- Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem. Eur. J. Oper. Res. (2002) 142:108–127Crossref, Google Scholar
- An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem. Naval Res. Logist. (2003) 50:742–769Crossref, Google Scholar
- Learning algorithms for separable approximations of stochastic optimization problems. Math. Oper. Res. (2004) 29:814–836Link, Google Scholar
- Markov Decision Processes (1994) (John Wiley & Sons, New York) Crossref, Google Scholar
- Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. (1985) 110:568–582Crossref, Google Scholar
- Si J., Barto A. G., Powell W. B., Wunsch D.Handbook of Learning and Approximate Dynamic Programming (2004) (Wiley-Interscience, Piscataway, NJ) Crossref, Google Scholar
- Reinforcement Learning (1998) (The MIT Press, Cambridge, MA) Google Scholar
- Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems. INFORMS J. Comput. (2006) 18:31–42Link, Google Scholar
- Asynchronous stochastic approximation and Q-learning. Machine Learn. (1994) 16:185–202Crossref, Google Scholar
- An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control (1997) 42:674–690Crossref, Google Scholar
- Learning from delayed rewards. (1989) . Ph.D. thesis, Cambridge University, Cambridge, UKGoogle Scholar
- Q-learning. Machine Learn. (1992) 8:279–292Crossref, Google Scholar

