Forward Recursion for Markov Decision Processes with Skip-Free-to-the-Right Transitions, Part I: Theory and Algorithm
Abstract
We consider a Markovian decision process with countable state space (states 0, 1, 2, …) which is skip-free to the right (a transition from i to j is impossible if j > i + 1). In this type of system it is easy to calculate by forward recursion the maximal total expected reward going from state 0 to state i; the same can be done, of course, for the case where a constant g is subtracted from the one-period reward function (g-revised reward). Let −wg(i) be the maximal total expected g-revised reward going from state 0 to state i. We show that wg(·) satisfies the average-reward optimality equation. If wg(·) satisfies a growth condition, then g = g*, the maximal average reward. For all other g, the function wg increases or decreases so fast that this cannot be the case. Thus, in principle the solution wg can be used to check if g < g* or g > g*, which suggests a method for approximating g* and an associated average-return optimal policy. We develop an efficient algorithm based on this idea. In a companion paper we shall show how the algorithm, or modifications of it, can be applied to some special cases, such as control of arrivals to a queue, control of the service rate, and controlled random walks.

