Multiple Policy Improvements in Undiscounted Markov Renewal Programming

Paul J. Schweitzer
Paul J. Schweitzer
Institute for Defense Analyses, Arlington, Virginia
Search for more papers by this author

Institute for Defense Analyses, Arlington, Virginia

Published Online:1 Jun 1971https://doi.org/10.1287/opre.19.3.784

Abstract

This paper examines, for undiscounted unichain Markov renewal programming, both the Hastings policy-value iteration algorithm and the case of multiple policy improvements between each policy evaluation. The modified policy improvement procedure proposed by Hastings either increases the gain rate or maintains it, and has a larger value improvement in some transient state than in all recurrent states. This prevents cycling and ensures convergence of the policy-value iteration algorithm. Multiple policy improvements, using either the unmodified or modified policy-improvement procedure, are shown to settle ultimately upon higher-gain policies, if any exist. The iterated policy improvements, each time using the improved values, also lead to upper and lower bounds on the maximal gain rate.

Volume 19, Issue 3

May-June 1971

Pages 559-841

Article Information

Metrics

Information

Published Online:June 01, 1971

Cite as

Paul J. Schweitzer, (1971) Multiple Policy Improvements in Undiscounted Markov Renewal Programming. Operations Research 19(3):784-793.

https://doi.org/10.1287/opre.19.3.784

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Multiple Policy Improvements in Undiscounted Markov Renewal Programming

Abstract

Volume 19, Issue 3

Article Information

Metrics

Information

Cite as

Sign Up for INFORMS Publications Updates and News