Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm

Sumit Kunnumkal
Sumit Kunnumkal
[email protected]
Indian School of Business, Gachibowli, Hyderabad 500032, India
Search for more papers by this author
,
Huseyin Topaloglu
Huseyin Topaloglu
[email protected]
School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853
Search for more papers by this author

Sumit Kunnumkal

[email protected]

Indian School of Business, Gachibowli, Hyderabad 500032, India

Search for more papers by this author

Huseyin Topaloglu

[email protected]

School of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853

Search for more papers by this author

Published Online:25 Feb 2008https://doi.org/10.1287/ijoc.1070.0240

Abstract

This paper shows how to exploit the structural properties of the underlying Markov decision problem to improve the convergence behavior of the Q-learning algorithm. In particular, we consider infinite-horizon discounted-cost Markov decision problems where there is a natural ordering between the states of the system and the value function is known to be monotone in the state. We propose a new variant of the Q-learning algorithm that ensures that the value function approximations obtained during the intermediate iterations are also monotone in the state. We establish the convergence of the proposed algorithm and experimentally show that it significantly improves the convergence behavior of the standard version of the Q-learning algorithm.

cover image INFORMS Journal on Computing

Volume 20, Issue 2

Spring 2008

Pages 169-331

Article Information

Supplemental Material

Metrics

Information

Received:July 01, 2005
Accepted:September 01, 2007
Published Online:February 25, 2008

Cite as

Sumit Kunnumkal, Huseyin Topaloglu, (2008) Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm. INFORMS Journal on Computing 20(2):288-301.

https://doi.org/10.1287/ijoc.1070.0240

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Exploiting the Structural Properties of the Underlying Markov Decision Problem in the Q-Learning Algorithm

Abstract

Volume 20, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News