Technical Note—Bounds on the Gain of a Markov Decision Process
Abstract
An algorithm for the steady-state solution of Markov decision problems has been proposed by Howard and modified by Hastings. This note shows, for the case of single-chain Markov decision processes, how bounds on the optimal gain can be obtained at each cycle of the foregoing algorithms. The results extend to Markov renewal programming. Related results are the bounds proposed by Odoni for use with White's value-iteration method of optimization.

