Forward Recursion for Markov Decision Processes with Skip-Free-to-the-Right Transitions, Part I: Theory and Algorithm

Published Online:https://doi.org/10.1287/moor.11.2.295

We consider a Markovian decision process with countable state space (states 0, 1, 2, …) which is skip-free to the right (a transition from i to j is impossible if j > i + 1). In this type of system it is easy to calculate by forward recursion the maximal total expected reward going from state 0 to state i; the same can be done, of course, for the case where a constant g is subtracted from the one-period reward function (g-revised reward). Let −wg(i) be the maximal total expected g-revised reward going from state 0 to state i. We show that wg(·) satisfies the average-reward optimality equation. If wg(·) satisfies a growth condition, then g = g*, the maximal average reward. For all other g, the function wg increases or decreases so fast that this cannot be the case. Thus, in principle the solution wg can be used to check if g < g* or g > g*, which suggests a method for approximating g* and an associated average-return optimal policy. We develop an efficient algorithm based on this idea. In a companion paper we shall show how the algorithm, or modifications of it, can be applied to some special cases, such as control of arrivals to a queue, control of the service rate, and controlled random walks.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.