Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm

Sumit Kunnumkal
Sumit Kunnumkal
[email protected]
Indian School of Business, Gachibowli, Hyderabad 500032, India
Search for more papers by this author
,
Huseyin Topaloglu
Huseyin Topaloglu
[email protected]
School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853
Search for more papers by this author

Sumit Kunnumkal

[email protected]

Indian School of Business, Gachibowli, Hyderabad 500032, India

Search for more papers by this author

Huseyin Topaloglu

[email protected]

School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853

Search for more papers by this author

Published Online:25 Feb 2008https://doi.org/10.1287/ijoc.1070.0240

References

Andradottir S. A stochastic approximation algorithm with varying bounds. Oper. Res. (1995) 43:1037–1048Link, Google Scholar
Barto A. G., Bradtke S. J., Singh S. P. Learning to act using real-time dynamic programming. Artificial Intelligence (1995) 72:81–138Crossref, Google Scholar
Bertsekas D. P., Tsitsiklis J. N.Neuro-Dynamic Programming (1996) (Athena Scientific, Belmont, MA) Google Scholar
de Farias D. P., Van Roy B. The linear programming approach to approximate dynamic programming. Oper. Res. (2003) 51:850–865Link, Google Scholar
Deb R. K., Serfozo R. F. Optimal control of batch service queues. Adv. Appl. Probab. (1973) 5:340–361Crossref, Google Scholar
Ding X. Estimation and optimization in discrete inventory models. (2002) . Ph.D. thesis, The University of British Columbia, VancouverGoogle Scholar
Ignall E., Kolesar P. Operating characteristics of a simple shuttle under local dispatching rules. Oper. Res. (1972) 20:1077–1088Link, Google Scholar
Ignall E., Kolesar P. Operating characteristics of an infinite capacity shuttle: Control at a single terminal. Oper. Res. (1974) 22:1008–1024Link, Google Scholar
Kosten L.Stochastic Theory of Service Systems (1973) (Pergamon Press, New York) Google Scholar
Kushner H. J., Clark D. S.Stochastic Approximation Methods for Constrained and Unconstrained Systems (1978) (Springer-Verlag, Berlin) Crossref, Google Scholar
Ljung L. Analysis of recursive stochastic algorithms. IEEE Trans. Automatic Control (1977) 22:551–575Crossref, Google Scholar
Papadaki K., Powell W. B. Exploiting structure in adaptive dynamic programming algorithms for a stochastic batch service problem. Eur. J. Oper. Res. (2002) 142:108–127Crossref, Google Scholar
Papadaki K., Powell W. B. An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem. Naval Res. Logist. (2003) 50:742–769Crossref, Google Scholar
Powell W. B., Ruszczynski A., Topaloglu H. Learning algorithms for separable approximations of stochastic optimization problems. Math. Oper. Res. (2004) 29:814–836Link, Google Scholar
Puterman M. L.Markov Decision Processes (1994) (John Wiley & Sons, New York) Crossref, Google Scholar
Schweitzer P., Seidmann A. Generalized polynomial approximations in Markovian decision processes. J. Math. Anal. Appl. (1985) 110:568–582Crossref, Google Scholar
Si J., Barto A. G., Powell W. B., Wunsch D.Handbook of Learning and Approximate Dynamic Programming (2004) (Wiley-Interscience, Piscataway, NJ) Crossref, Google Scholar
Sutton R. S., Barto A. G.Reinforcement Learning (1998) (The MIT Press, Cambridge, MA) Google Scholar
Topaloglu H., Powell W. B. Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems. INFORMS J. Comput. (2006) 18:31–42Link, Google Scholar
Tsitsiklis J. N. Asynchronous stochastic approximation and Q-learning. Machine Learn. (1994) 16:185–202Crossref, Google Scholar
Tsitsiklis J. N., Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control (1997) 42:674–690Crossref, Google Scholar
Watkins C. J. C. H. Learning from delayed rewards. (1989) . Ph.D. thesis, Cambridge University, Cambridge, UKGoogle Scholar
Watkins C. J. C. H., Dayan P. Q-learning. Machine Learn. (1992) 8:279–292Crossref, Google Scholar

cover image INFORMS Journal on Computing

Volume 20, Issue 2

Spring 2008

Pages 169-331

Article Information

Supplemental Material

Metrics

Information

Received:July 01, 2005
Accepted:September 01, 2007
Published Online:February 25, 2008

Cite as

Sumit Kunnumkal, Huseyin Topaloglu, (2008) Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm. INFORMS Journal on Computing 20(2):288-301.

https://doi.org/10.1287/ijoc.1070.0240

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm

References

Volume 20, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News