Quantile Markov Decision Processes

Published Online:https://doi.org/10.1287/opre.2021.2123

References

  • Altman E (1999) Constrained Markov Decision Processes (CRC Press, New York).Google Scholar
  • Arlotto A, Gans N, Steele JM (2014) Markov decision problems where means bound variances. Oper. Res. 62(4):864–875.LinkGoogle Scholar
  • Austin PC, Tu JV, Daly PA, Alter DA (2005) The use of quantile regression in healthcare research: A case study examining gender differences in the timeliness of thrombolytic therapy. Statist. Medicine 24(5):791–816.CrossrefGoogle Scholar
  • Bauerle N, Ott J (2011) Markov decision processes with average-value-at-risk criteria. Math. Methods Oper. Res. 74(3):361–379.CrossrefGoogle Scholar
  • Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (IMLS, Sydney, Australia), 449–458.Google Scholar
  • Berkowitz J, O’Brien J (2002) How accurate are value-at-risk models at commercial banks? J. Finance 57(3):1093–1111.CrossrefGoogle Scholar
  • Bertsekas DP (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
  • Beyerlein A (2014) Quantile regression—Opportunities and challenges from a user’s perspective. Amer. J. Epidemiology 180(3):330–331.CrossrefGoogle Scholar
  • Carpin S, Chow YL, Pavone M (2016) Risk aversion in finite Markov decision processes using total cost criteria and average value at risk. Proc. 2016 IEEE Internat. Conf. Robotics Automation (IEEE Robotics and Automation Society, New York), 335–342.Google Scholar
  • Cheridito P, Stadje M (2009) Time-inconsistency of VaR and time-consistent alternatives. Finance Res. Lett. 6(1):40–46.CrossrefGoogle Scholar
  • Chow Y (2017) Risk-sensitive and data-driven sequential decision making. Unpublished PhD thesis, Institute for Computational and Mathematical Engineering, Stanford University, CA.Google Scholar
  • Chow Y, Ghavamzadeh M (2014) Algorithms for CVaR optimization in MDPs. Adv. Neural Inform. Processing Systems (Montreal, Canada), 2:3509–3517.Google Scholar
  • Chow Y, Tamar A, Mannor S, Pavone M (2015) Risk-sensitive and robust decision-making: A CVaR optimization approach. Adv. Neural Inform. Processing Systems (Montreal, Canada), 1:1522–1530.Google Scholar
  • Dabney W, Rowland M, Bellemare MG, Munos R (2018) Distributional reinforcement learning with quantile regression. 32nd AAAI Conf. Artificial Intelligence (Association of Advancement in Artificial Intelligence, Palo Alto, CA), 2892–2901.Google Scholar
  • DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Systems Rev. 41(6):205–220.CrossrefGoogle Scholar
  • Delage E, Mannor S (2010) Percentile optimization for Markov decision processes with parameter uncertainty. Oper. Res. 58(1):203–213.LinkGoogle Scholar
  • Di Castro D, Tamar A, Mannor S (2012) Policy gradients with variance related risk criteria. Preprint, submitted June 27, https://arxiv.org/abs/1206.6404.Google Scholar
  • Duffie D, Pan J (1997) An overview of value at risk. J. Derivatives 4(3):7–49.CrossrefGoogle Scholar
  • Ermon S, Gomes C, Selman B, Vladimirsky A (2012) Probabilistic planning with non-linear utility functions and worst-case guarantees. Proc. 11th Internat. Conf. Autonomous Agents Multiagent Systems, vol. 2 (Association for Computing Machinery, Valencia, Spain), 965–972.Google Scholar
  • Filar JA, Krass D, Ross KW (1995) Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Automatic Control 40(1):2–10.CrossrefGoogle Scholar
  • Fraenkel L, Bogardus ST Jr, Wittink DR (2003) Risk-attitude and patient treatment preferences. Lupus 12(5):370–376.CrossrefGoogle Scholar
  • Freiberg MS, So-Armah K (2016) HIV and cardiovascular disease: We need a mechanism, and we need a plan. J. Amer. Heart Assoc. 5(3):e003411.CrossrefGoogle Scholar
  • Gilbert H, Weng P, Xu Y (2016) Optimizing quantiles in preference-based Markov decision processes. Preprint, submitted December 1, https://arxiv.org/abs/1612.00094.Google Scholar
  • Howard RA, Matheson JE (1972) Risk-sensitive Markov decision processes. Management Sci. 18(7):356–369.LinkGoogle Scholar
  • Iancu DA, Petrik M, Subramanian D (2015) Tight approximations of dynamic risk measures. Math. Oper. Res. 40(3):655–682.LinkGoogle Scholar
  • Jiang DR, Powell WB (2018) Risk-averse approximate dynamic programming with quantile-based risk measures. Math. Oper. Res. 43(2):554–579.LinkGoogle Scholar
  • Mannor S, Tsitsiklis J (2011) Mean-variance optimization in Markov decision processes. Preprint, submitted April 29, https://arxiv.org/abs/1104.5601.Google Scholar
  • Mason JE, Denton BT, Shah ND, Smith SA (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.CrossrefGoogle Scholar
  • Negoescu DM, Owens DK, Brandeau ML, Bendavid E (2012) Balancing immunological benefits and cardiovascular risks of antiretroviral therapy: When is immediate treatment optimal? Clinical Infectious Diseases 55(10):1392–1399.CrossrefGoogle Scholar
  • Nemirovski A, Shapiro A (2007) Convex approximations of chance constrained programs. SIAM J. Optim. 17(4):969–996.CrossrefGoogle Scholar
  • Nilim A, El Ghaoui L (2005) Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5):780–798.LinkGoogle Scholar
  • Pflug GC, Pichler A (2016) Time-consistent decisions and temporal decomposition of coherent risk functionals. Math. Oper. Res. 41(2):682–699.LinkGoogle Scholar
  • Piunovskiy AB (2006) Dynamic programming in constrained Markov decision processes. Control Cybernetics 35(3):645–660.Google Scholar
  • Rockafellar RT, Uryasev S (2002) Conditional value-at-risk for general loss distributions. J. Banking Finance 26(7):1443–1471.CrossrefGoogle Scholar
  • Ruszczyński A (2010) Risk-averse dynamic programming for Markov decision processes. Math. Programming 125(2):235–261.CrossrefGoogle Scholar
  • Sennott L (1989) Average cost semi-Markov decision processes and the control of queueing systems. Probab. Engrg. Inform. Sci. 3(2):247–272.CrossrefGoogle Scholar
  • Shapiro A, Tekaya W, Paulo da Costa J, Soares MP (2013) Risk neutral and risk averse stochastic dual dynamic programming method. Eur. J. Oper. Res. 224(2):375–391.CrossrefGoogle Scholar
  • Shechter SM, Bailey MD, Schaefer AJ, Roberts MS (2008) The optimal time to initiate HIV therapy under ordered health states. Oper. Res. 56(1):20–33.LinkGoogle Scholar
  • Tamar A, Di Castro D, Mannor S (2012) Policy gradients with variance-related risk criteria. Proc. 29th Internat. Conf. Machine Learn. (IMLS, Edinburgh, Scotland), 1651–1658.Google Scholar
  • Tanser F, Bärnighausen T, Grapsa E, Zaidi J, Newell ML (2013) High coverage of ART associated with decline in risk of HIV acquisition in rural KwaZulu-Natal, South Africa. Science 339(6122):966–971.CrossrefGoogle Scholar
  • Tsitsiklis JN, van Roy B (1999) Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Automatic Control 44(10):1840–1851.CrossrefGoogle Scholar
  • Ummels M, Baier C (2013) Computing quantiles in Markov reward models. Internat. Conf. Foundations Software Sci. Comput. Structures (Foundations of Software Science and Computational Structures, Rome, Italy), 353–368.Google Scholar
  • Wiesemann W, Kuhn D, Rustem B (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.LinkGoogle Scholar
  • World Health Organization (2018) Global health observatory data: HIV/AIDS. Accessed September 1, 2019, http://www.who.int/gho/hiv/en/.Google Scholar
  • Yang D, Zhao L, Lin Z, Qin T, Bian J, Liu TY (2019) Fully parameterized quantile function for distributional reinforcement learning. Adv. Neural Inform. Processing Systems 33:6190–6199.Google Scholar
  • Yu P, Haskell WB, Xu H (2017) Dynamic programming for risk-aware sequential optimization. 2017 IEEE 56th Annual Conf. Decision Control (IEEE, New York), 4934–4939.Google Scholar
  • Zhang Y, Steimle LM, Denton BT (2015) Robust Markov decision processes for medical treatment decisions. Working paper, University of Michigan, Ann Arbor.Google Scholar
  • Zhong H, Arjmand IS, Brandeau ML, Bendavid E (2021) Health outcomes and cost-effectiveness of treating depression in people with HIV in Sub-Saharan Africa: A model-based analysis. AIDS Care. 33(4):441–447.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.