An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

This paper concerns discrete-time, finite state multichain MDPs with compact action sets. The optimality criterion is long-run average cost. Simple examples illustrate that optimal stationary Markov policies do not always exist. We establish the existence of ε-optimal policies that are stationary Markovian, and develop an algorithm that computes these approximate optimal policies. We establish a necessary and sufficient condition for the existence of an optimal policy that is stationary Markovian, and in case that such an optimal policy exists the algorithm computes it.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.