Multiple Policy Improvements in Undiscounted Markov Renewal Programming

Published Online:https://doi.org/10.1287/opre.19.3.784

This paper examines, for undiscounted unichain Markov renewal programming, both the Hastings policy-value iteration algorithm and the case of multiple policy improvements between each policy evaluation. The modified policy improvement procedure proposed by Hastings either increases the gain rate or maintains it, and has a larger value improvement in some transient state than in all recurrent states. This prevents cycling and ensures convergence of the policy-value iteration algorithm. Multiple policy improvements, using either the unmodified or modified policy-improvement procedure, are shown to settle ultimately upon higher-gain policies, if any exist. The iterated policy improvements, each time using the improved values, also lead to upper and lower bounds on the maximal gain rate.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.