A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Mehrdad Moharrami
Corresponding Author
Mehrdad Moharrami
[email protected]
https://orcid.org/0000-0003-3907-8406
Computer Science Department, University of Iowa, Iowa City, Iowa 52242
Search for more papers by this author
,
Yashaswini Murthy
Yashaswini Murthy
[email protected]
https://orcid.org/0000-0002-8788-6873
Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801; and Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801
Search for more papers by this author
,
Arghyadip Roy
Arghyadip Roy
[email protected]
https://orcid.org/0000-0001-9955-9514
Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India
Search for more papers by this author
,
R. Srikant
R. Srikant
[email protected]
https://orcid.org/0000-0003-1483-5204
Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801; and Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801
Search for more papers by this author

Mehrdad Moharrami

Corresponding Author

Mehrdad Moharrami

[email protected]

https://orcid.org/0000-0003-3907-8406

Computer Science Department, University of Iowa, Iowa City, Iowa 52242

Search for more papers by this author

Yashaswini Murthy

[email protected]

https://orcid.org/0000-0002-8788-6873

Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801; and Department of Electrical & Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801

Search for more papers by this author

Arghyadip Roy

[email protected]

https://orcid.org/0000-0001-9955-9514

Mehta Family School of Data Science and Artificial Intelligence, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India

Search for more papers by this author

R. Srikant

[email protected]

https://orcid.org/0000-0003-1483-5204

Search for more papers by this author

Published Online:11 Mar 2024https://doi.org/10.1287/moor.2022.0139

Abstract

We study the risk-sensitive exponential cost Markov decision process (MDP) formulation and develop a trajectory-based gradient algorithm to find the stationary point of the cost associated with a set of parameterized policies. We derive a formula that can be used to compute the policy gradient from (state, action, cost) information collected from sample paths of the MDP for each fixed parameterized policy. Unlike the traditional average cost problem, standard stochastic approximation theory cannot be used to exploit this formula. To address the issue, we introduce a truncated and smooth version of the risk-sensitive cost and show that this new cost criterion can be used to approximate the risk-sensitive cost and its gradient uniformly under some mild assumptions. We then develop a trajectory-based gradient algorithm to minimize the smooth truncated estimation of the risk-sensitive cost and derive conditions under which a sequence of truncations can be used to solve the original, untruncated cost problem.

Funding: This work was supported by the Office of Naval Research Global [Grant N0001419-1-2566], the Division of Computer and Network Systems [Grant 21-06801], the Army Research Office [Grant W911NF-19-1-0379], and the Division of Computing and Communication Foundations [Grants 17-04970 and 19-34986].

cover image Mathematics of Operations Research

Volume 50, Issue 1

February 2025

Pages 1-781 C2

Article Information

Metrics

Information

Received:May 17, 2022
Accepted:January 01, 2024
Published Online:March 11, 2024

Cite as

Mehrdad Moharrami; , Yashaswini Murthy; , Arghyadip Roy, R. Srikant; (2024) A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP. Mathematics of Operations Research 50(1):431-458.

https://doi.org/10.1287/moor.2022.0139

Keywords

Acknowledgments

The work presented here was supported in part by the NSF grants CCF 19-34986, CNS 21-06801, CCF 17-04970, ARO Grant W911NF-19-1-0379, and ONR grant N00014-19-1-2566.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Policy Gradient Algorithm for the Risk-Sensitive Exponential Cost MDP

Abstract

Volume 50, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News