Quantile Markov Decision Processes

Xiaocheng Li
Corresponding Author
Xiaocheng Li
[email protected]
https://orcid.org/0000-0001-6155-9068
Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Search for more papers by this author
,
Huaiyang Zhong
Huaiyang Zhong
[email protected]
https://orcid.org/0000-0002-2902-1644
Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Search for more papers by this author
,
Margaret L. Brandeau
Margaret L. Brandeau
[email protected]
https://orcid.org/0000-0001-9331-8920
Department of Management Science and Engineering, Stanford University, Stanford, California 94305
Search for more papers by this author

Xiaocheng Li

Corresponding Author

Xiaocheng Li

[email protected]

https://orcid.org/0000-0001-6155-9068

Department of Management Science and Engineering, Stanford University, Stanford, California 94305

Search for more papers by this author

Huaiyang Zhong

[email protected]

https://orcid.org/0000-0002-2902-1644

Department of Management Science and Engineering, Stanford University, Stanford, California 94305

Search for more papers by this author

Margaret L. Brandeau

[email protected]

https://orcid.org/0000-0001-9331-8920

Department of Management Science and Engineering, Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:9 Nov 2021https://doi.org/10.1287/opre.2021.2123

References

Altman E (1999) Constrained Markov Decision Processes (CRC Press, New York).Google Scholar
Arlotto A, Gans N, Steele JM (2014) Markov decision problems where means bound variances. Oper. Res. 62(4):864–875.Link, Google Scholar
Austin PC, Tu JV, Daly PA, Alter DA (2005) The use of quantile regression in healthcare research: A case study examining gender differences in the timeliness of thrombolytic therapy. Statist. Medicine 24(5):791–816.Crossref, Google Scholar
Bauerle N, Ott J (2011) Markov decision processes with average-value-at-risk criteria. Math. Methods Oper. Res. 74(3):361–379.Crossref, Google Scholar
Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (IMLS, Sydney, Australia), 449–458.Google Scholar
Berkowitz J, O’Brien J (2002) How accurate are value-at-risk models at commercial banks? J. Finance 57(3):1093–1111.Crossref, Google Scholar
Bertsekas DP (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
Beyerlein A (2014) Quantile regression—Opportunities and challenges from a user’s perspective. Amer. J. Epidemiology 180(3):330–331.Crossref, Google Scholar
Carpin S, Chow YL, Pavone M (2016) Risk aversion in finite Markov decision processes using total cost criteria and average value at risk. Proc. 2016 IEEE Internat. Conf. Robotics Automation (IEEE Robotics and Automation Society, New York), 335–342.Google Scholar
Cheridito P, Stadje M (2009) Time-inconsistency of VaR and time-consistent alternatives. Finance Res. Lett. 6(1):40–46.Crossref, Google Scholar
Chow Y (2017) Risk-sensitive and data-driven sequential decision making. Unpublished PhD thesis, Institute for Computational and Mathematical Engineering, Stanford University, CA.Google Scholar
Chow Y, Ghavamzadeh M (2014) Algorithms for CVaR optimization in MDPs. Adv. Neural Inform. Processing Systems (Montreal, Canada), 2:3509–3517.Google Scholar
Chow Y, Tamar A, Mannor S, Pavone M (2015) Risk-sensitive and robust decision-making: A CVaR optimization approach. Adv. Neural Inform. Processing Systems (Montreal, Canada), 1:1522–1530.Google Scholar
Dabney W, Rowland M, Bellemare MG, Munos R (2018) Distributional reinforcement learning with quantile regression. 32nd AAAI Conf. Artificial Intelligence (Association of Advancement in Artificial Intelligence, Palo Alto, CA), 2892–2901.Google Scholar
DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Systems Rev. 41(6):205–220.Crossref, Google Scholar
Delage E, Mannor S (2010) Percentile optimization for Markov decision processes with parameter uncertainty. Oper. Res. 58(1):203–213.Link, Google Scholar
Di Castro D, Tamar A, Mannor S (2012) Policy gradients with variance related risk criteria. Preprint, submitted June 27, https://arxiv.org/abs/1206.6404.Google Scholar
Duffie D, Pan J (1997) An overview of value at risk. J. Derivatives 4(3):7–49.Crossref, Google Scholar
Ermon S, Gomes C, Selman B, Vladimirsky A (2012) Probabilistic planning with non-linear utility functions and worst-case guarantees. Proc. 11th Internat. Conf. Autonomous Agents Multiagent Systems, vol. 2 (Association for Computing Machinery, Valencia, Spain), 965–972.Google Scholar
Filar JA, Krass D, Ross KW (1995) Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Automatic Control 40(1):2–10.Crossref, Google Scholar
Fraenkel L, Bogardus ST Jr, Wittink DR (2003) Risk-attitude and patient treatment preferences. Lupus 12(5):370–376.Crossref, Google Scholar
Freiberg MS, So-Armah K (2016) HIV and cardiovascular disease: We need a mechanism, and we need a plan. J. Amer. Heart Assoc. 5(3):e003411.Crossref, Google Scholar
Gilbert H, Weng P, Xu Y (2016) Optimizing quantiles in preference-based Markov decision processes. Preprint, submitted December 1, https://arxiv.org/abs/1612.00094.Google Scholar
Howard RA, Matheson JE (1972) Risk-sensitive Markov decision processes. Management Sci. 18(7):356–369.Link, Google Scholar
Iancu DA, Petrik M, Subramanian D (2015) Tight approximations of dynamic risk measures. Math. Oper. Res. 40(3):655–682.Link, Google Scholar
Jiang DR, Powell WB (2018) Risk-averse approximate dynamic programming with quantile-based risk measures. Math. Oper. Res. 43(2):554–579.Link, Google Scholar
Mannor S, Tsitsiklis J (2011) Mean-variance optimization in Markov decision processes. Preprint, submitted April 29, https://arxiv.org/abs/1104.5601.Google Scholar
Mason JE, Denton BT, Shah ND, Smith SA (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.Crossref, Google Scholar
Negoescu DM, Owens DK, Brandeau ML, Bendavid E (2012) Balancing immunological benefits and cardiovascular risks of antiretroviral therapy: When is immediate treatment optimal? Clinical Infectious Diseases 55(10):1392–1399.Crossref, Google Scholar
Nemirovski A, Shapiro A (2007) Convex approximations of chance constrained programs. SIAM J. Optim. 17(4):969–996.Crossref, Google Scholar
Nilim A, El Ghaoui L (2005) Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5):780–798.Link, Google Scholar
Pflug GC, Pichler A (2016) Time-consistent decisions and temporal decomposition of coherent risk functionals. Math. Oper. Res. 41(2):682–699.Link, Google Scholar
Piunovskiy AB (2006) Dynamic programming in constrained Markov decision processes. Control Cybernetics 35(3):645–660.Google Scholar
Rockafellar RT, Uryasev S (2002) Conditional value-at-risk for general loss distributions. J. Banking Finance 26(7):1443–1471.Crossref, Google Scholar
Ruszczyński A (2010) Risk-averse dynamic programming for Markov decision processes. Math. Programming 125(2):235–261.Crossref, Google Scholar
Sennott L (1989) Average cost semi-Markov decision processes and the control of queueing systems. Probab. Engrg. Inform. Sci. 3(2):247–272.Crossref, Google Scholar
Shapiro A, Tekaya W, Paulo da Costa J, Soares MP (2013) Risk neutral and risk averse stochastic dual dynamic programming method. Eur. J. Oper. Res. 224(2):375–391.Crossref, Google Scholar
Shechter SM, Bailey MD, Schaefer AJ, Roberts MS (2008) The optimal time to initiate HIV therapy under ordered health states. Oper. Res. 56(1):20–33.Link, Google Scholar
Tamar A, Di Castro D, Mannor S (2012) Policy gradients with variance-related risk criteria. Proc. 29th Internat. Conf. Machine Learn. (IMLS, Edinburgh, Scotland), 1651–1658.Google Scholar
Tanser F, Bärnighausen T, Grapsa E, Zaidi J, Newell ML (2013) High coverage of ART associated with decline in risk of HIV acquisition in rural KwaZulu-Natal, South Africa. Science 339(6122):966–971.Crossref, Google Scholar
Tsitsiklis JN, van Roy B (1999) Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Automatic Control 44(10):1840–1851.Crossref, Google Scholar
Ummels M, Baier C (2013) Computing quantiles in Markov reward models. Internat. Conf. Foundations Software Sci. Comput. Structures (Foundations of Software Science and Computational Structures, Rome, Italy), 353–368.Google Scholar
Wiesemann W, Kuhn D, Rustem B (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.Link, Google Scholar
World Health Organization (2018) Global health observatory data: HIV/AIDS. Accessed September 1, 2019, http://www.who.int/gho/hiv/en/.Google Scholar
Yang D, Zhao L, Lin Z, Qin T, Bian J, Liu TY (2019) Fully parameterized quantile function for distributional reinforcement learning. Adv. Neural Inform. Processing Systems 33:6190–6199.Google Scholar
Yu P, Haskell WB, Xu H (2017) Dynamic programming for risk-aware sequential optimization. 2017 IEEE 56th Annual Conf. Decision Control (IEEE, New York), 4934–4939.Google Scholar
Zhang Y, Steimle LM, Denton BT (2015) Robust Markov decision processes for medical treatment decisions. Working paper, University of Michigan, Ann Arbor.Google Scholar
Zhong H, Arjmand IS, Brandeau ML, Bendavid E (2021) Health outcomes and cost-effectiveness of treating depression in people with HIV in Sub-Saharan Africa: A model-based analysis. AIDS Care. 33(4):441–447.Crossref, Google Scholar

Volume 70, Issue 3

May-June 2022

Pages iii-viii, 1293-1952, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:May 31, 2018
Accepted:December 02, 2020
Published Online:November 09, 2021

Cite as

Xiaocheng Li, Huaiyang Zhong, Margaret L. Brandeau (2021) Quantile Markov Decision Processes. Operations Research 70(3):1428-1447.

https://doi.org/10.1287/opre.2021.2123

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Quantile Markov Decision Processes

References

Volume 70, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News