Quantile Markov Decision Processes

Xiaocheng Li
Corresponding Author
Xiaocheng Li
[email protected]
https://orcid.org/0000-0001-6155-9068
Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Search for more papers by this author
,
Huaiyang Zhong
Huaiyang Zhong
[email protected]
https://orcid.org/0000-0002-2902-1644
Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Search for more papers by this author
,
Margaret L. Brandeau
Margaret L. Brandeau
[email protected]
https://orcid.org/0000-0001-9331-8920
Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Search for more papers by this author

Xiaocheng Li

Corresponding Author

Xiaocheng Li

[email protected]

https://orcid.org/0000-0001-6155-9068

Department of Management Science and Engineering, Stanford University, Stanford, California 94305

Search for more papers by this author

Huaiyang Zhong

[email protected]

https://orcid.org/0000-0002-2902-1644

Department of Management Science and Engineering, Stanford University, Stanford, California 94305

Search for more papers by this author

Margaret L. Brandeau

[email protected]

https://orcid.org/0000-0001-9331-8920

Department of Management Science and Engineering, Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:9 Nov 2021https://doi.org/10.1287/opre.2021.2123

Abstract

The goal of a traditional Markov decision process (MDP) is to maximize expected cumulative reward over a defined horizon (possibly infinite). In many applications, however, a decision maker may be interested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paper, we consider the problem of optimizing the quantiles of the cumulative rewards of an MDP, which we refer to as a quantile Markov decision process (QMDP). We provide analytical results characterizing the optimal QMDP value function and present a dynamic programming-based algorithm to solve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatment initiation problem, in which patients aim to balance the potential benefits and risks of the treatment.

Volume 70, Issue 3

May-June 2022

Pages iii-viii, 1293-1952, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:May 31, 2018
Accepted:December 02, 2020
Published Online:November 09, 2021

Cite as

Xiaocheng Li, Huaiyang Zhong, Margaret L. Brandeau (2021) Quantile Markov Decision Processes. Operations Research 70(3):1428-1447.

https://doi.org/10.1287/opre.2021.2123

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Quantile Markov Decision Processes

Abstract

Volume 70, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News