On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes

Yashaswini Murthy
Corresponding Author
Yashaswini Murthy
[email protected]
https://orcid.org/0000-0002-8788-6873
Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125
Search for more papers by this author
,
Mehrdad Moharrami
Mehrdad Moharrami
[email protected]
https://orcid.org/0000-0003-3907-8406
Computer Science, University of Iowa, Iowa City, Iowa 52242
Search for more papers by this author
,
Rayadurgam Srikant
Rayadurgam Srikant
[email protected]
https://orcid.org/0000-0003-1483-5204
Electrical and Computer Engineering and Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Champaign, Illinois 61820
Search for more papers by this author

Yashaswini Murthy

Corresponding Author

Yashaswini Murthy

[email protected]

https://orcid.org/0000-0002-8788-6873

Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California 91125

Search for more papers by this author

Mehrdad Moharrami

[email protected]

https://orcid.org/0000-0003-3907-8406

Computer Science, University of Iowa, Iowa City, Iowa 52242

Search for more papers by this author

Rayadurgam Srikant

[email protected]

https://orcid.org/0000-0003-1483-5204

Electrical and Computer Engineering and Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Champaign, Illinois 61820

Search for more papers by this author

Published Online:27 Nov 2025https://doi.org/10.1287/opre.2024.0818

Abstract

Modified policy iteration (MPI) is a dynamic programming algorithm that combines elements of policy iteration and value iteration. The convergence of MPI is well-studied in the context of discounted and average-cost Markov decision processes (MDPs). In this work, we consider the exponential cost risk-sensitive MDP formulation, which is known to provide some robustness to model parameters. Although policy iteration and value iteration are well-studied in the context of risk-sensitive MDPs, MPI is unexplored. To the best of our knowledge, we provide the first proof that MPI also converges for the risk-sensitive problem in the case of finite state and action spaces. Because the exponential cost formulation deals with the multiplicative Bellman equation, our main contribution is a convergence proof that is quite different than existing results for discounted and risk-neutral average-cost problems as well as risk-sensitive value and policy iteration approaches.

Funding: This work was supported by the National Science Foundation (NSF) Grants Division of Computing and Communication Foundations (CCF) [Grant 22-07547], the Division of Computer and Network Systems (CNS) [Grant 23-12714], and the Air Force Office of Scientific Research (AFOSR) [Grant FA9550-24-1-0002].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2024.0818.

Volume 74, Issue 3

May-June 2026

Pages v-x, 1153-1728, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:February 16, 2024
Accepted:September 30, 2025
Published Online:November 27, 2025

Cite as

Yashaswini Murthy, Mehrdad Moharrami, Rayadurgam Srikant (2025) On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes. Operations Research 74(3):1425-1436.

https://doi.org/10.1287/opre.2024.0818

Keywords

Acknowledgments

The authors thank the anonymous reviewers, associate editor, and area editor for their helpful feedback in refining the manuscript.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On the Convergence of Modified Policy Iteration in Risk-Sensitive Exponential Cost Markov Decision Processes

Abstract

Volume 74, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News