On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes

Published Online:https://doi.org/10.1287/opre.2024.0818

Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI is well-studied in the context of discounted and average-cost Markov decision processes (MDPs). In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration are well-studied in the context of risk-sensitive MDPs, MPI is unexplored. To the best of our knowledge, we provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Because the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof that is quite different than existing results for discounted and risk-neutral average-cost problems as well as risk-sensitive value and policy iteration approaches.

Funding: This work was supported by the National Science Foundation (NSF) Grants Division of Computing and Communication Foundations (CCF) [Grant 22-07547], the Division of Computer and Network Systems (CNS) [Grant 23-12714], and the Air Force Office of Scientific Research (AFOSR) [Grant FA9550-24-1-0002].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2024.0818.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.