Quantile Markov Decision Processes
Published Online:9 Nov 2021https://doi.org/10.1287/opre.2021.2123
References
- (1999) Constrained Markov Decision Processes (CRC Press, New York).Google Scholar
- (2014) Markov decision problems where means bound variances. Oper. Res. 62(4):864–875.Link, Google Scholar
- (2005) The use of quantile regression in healthcare research: A case study examining gender differences in the timeliness of thrombolytic therapy. Statist. Medicine 24(5):791–816.Crossref, Google Scholar
- (2011) Markov decision processes with average-value-at-risk criteria. Math. Methods Oper. Res. 74(3):361–379.Crossref, Google Scholar
- (2017) A distributional perspective on reinforcement learning. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (IMLS, Sydney, Australia), 449–458.Google Scholar
- (2002) How accurate are value-at-risk models at commercial banks? J. Finance 57(3):1093–1111.Crossref, Google Scholar
- (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
- (2014) Quantile regression—Opportunities and challenges from a user’s perspective. Amer. J. Epidemiology 180(3):330–331.Crossref, Google Scholar
- (2016) Risk aversion in finite Markov decision processes using total cost criteria and average value at risk. Proc. 2016 IEEE Internat. Conf. Robotics Automation (IEEE Robotics and Automation Society, New York), 335–342.Google Scholar
- (2009) Time-inconsistency of VaR and time-consistent alternatives. Finance Res. Lett. 6(1):40–46.Crossref, Google Scholar
- (2017) Risk-sensitive and data-driven sequential decision making. Unpublished PhD thesis, Institute for Computational and Mathematical Engineering, Stanford University, CA.Google Scholar
- (2014) Algorithms for CVaR optimization in MDPs. Adv. Neural Inform. Processing Systems (Montreal, Canada), 2:3509–3517.Google Scholar
- (2015) Risk-sensitive and robust decision-making: A CVaR optimization approach. Adv. Neural Inform. Processing Systems (Montreal, Canada), 1:1522–1530.Google Scholar
- (2018) Distributional reinforcement learning with quantile regression. 32nd AAAI Conf. Artificial Intelligence (Association of Advancement in Artificial Intelligence, Palo Alto, CA), 2892–2901.Google Scholar
- (2007) Dynamo: Amazon’s highly available key-value store. SIGOPS Oper. Systems Rev. 41(6):205–220.Crossref, Google Scholar
- (2010) Percentile optimization for Markov decision processes with parameter uncertainty. Oper. Res. 58(1):203–213.Link, Google Scholar
- (2012) Policy gradients with variance related risk criteria. Preprint, submitted June 27, https://arxiv.org/abs/1206.6404.Google Scholar
- (1997) An overview of value at risk. J. Derivatives 4(3):7–49.Crossref, Google Scholar
- (2012) Probabilistic planning with non-linear utility functions and worst-case guarantees. Proc. 11th Internat. Conf. Autonomous Agents Multiagent Systems, vol. 2 (Association for Computing Machinery, Valencia, Spain), 965–972.Google Scholar
- (1995) Percentile performance criteria for limiting average Markov decision processes. IEEE Trans. Automatic Control 40(1):2–10.Crossref, Google Scholar
- (2003) Risk-attitude and patient treatment preferences. Lupus 12(5):370–376.Crossref, Google Scholar
- (2016) HIV and cardiovascular disease: We need a mechanism, and we need a plan. J. Amer. Heart Assoc. 5(3):e003411.Crossref, Google Scholar
- (2016) Optimizing quantiles in preference-based Markov decision processes. Preprint, submitted December 1, https://arxiv.org/abs/1612.00094.Google Scholar
- (1972) Risk-sensitive Markov decision processes. Management Sci. 18(7):356–369.Link, Google Scholar
- (2015) Tight approximations of dynamic risk measures. Math. Oper. Res. 40(3):655–682.Link, Google Scholar
- (2018) Risk-averse approximate dynamic programming with quantile-based risk measures. Math. Oper. Res. 43(2):554–579.Link, Google Scholar
- (2011) Mean-variance optimization in Markov decision processes. Preprint, submitted April 29, https://arxiv.org/abs/1104.5601.Google Scholar
- (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.Crossref, Google Scholar
- (2012) Balancing immunological benefits and cardiovascular risks of antiretroviral therapy: When is immediate treatment optimal? Clinical Infectious Diseases 55(10):1392–1399.Crossref, Google Scholar
- (2007) Convex approximations of chance constrained programs. SIAM J. Optim. 17(4):969–996.Crossref, Google Scholar
- (2005) Robust control of Markov decision processes with uncertain transition matrices. Oper. Res. 53(5):780–798.Link, Google Scholar
- (2016) Time-consistent decisions and temporal decomposition of coherent risk functionals. Math. Oper. Res. 41(2):682–699.Link, Google Scholar
- (2006) Dynamic programming in constrained Markov decision processes. Control Cybernetics 35(3):645–660.Google Scholar
- (2002) Conditional value-at-risk for general loss distributions. J. Banking Finance 26(7):1443–1471.Crossref, Google Scholar
- (2010) Risk-averse dynamic programming for Markov decision processes. Math. Programming 125(2):235–261.Crossref, Google Scholar
- (1989) Average cost semi-Markov decision processes and the control of queueing systems. Probab. Engrg. Inform. Sci. 3(2):247–272.Crossref, Google Scholar
- (2013) Risk neutral and risk averse stochastic dual dynamic programming method. Eur. J. Oper. Res. 224(2):375–391.Crossref, Google Scholar
- (2008) The optimal time to initiate HIV therapy under ordered health states. Oper. Res. 56(1):20–33.Link, Google Scholar
- (2012) Policy gradients with variance-related risk criteria. Proc. 29th Internat. Conf. Machine Learn. (IMLS, Edinburgh, Scotland), 1651–1658.Google Scholar
- (2013) High coverage of ART associated with decline in risk of HIV acquisition in rural KwaZulu-Natal, South Africa. Science 339(6122):966–971.Crossref, Google Scholar
- (1999) Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Trans. Automatic Control 44(10):1840–1851.Crossref, Google Scholar
- (2013) Computing quantiles in Markov reward models. Internat. Conf. Foundations Software Sci. Comput. Structures (Foundations of Software Science and Computational Structures, Rome, Italy), 353–368.Google Scholar
- (2013) Robust Markov decision processes. Math. Oper. Res. 38(1):153–183.Link, Google Scholar
- World Health Organization (2018) Global health observatory data: HIV/AIDS. Accessed September 1, 2019, http://www.who.int/gho/hiv/en/.Google Scholar
- (2019) Fully parameterized quantile function for distributional reinforcement learning. Adv. Neural Inform. Processing Systems 33:6190–6199.Google Scholar
- (2017) Dynamic programming for risk-aware sequential optimization. 2017 IEEE 56th Annual Conf. Decision Control (IEEE, New York), 4934–4939.Google Scholar
- (2015) Robust Markov decision processes for medical treatment decisions. Working paper, University of Michigan, Ann Arbor.Google Scholar
- (2021) Health outcomes and cost-effectiveness of treating depression in people with HIV in Sub-Saharan Africa: A model-based analysis. AIDS Care. 33(4):441–447.Crossref, Google Scholar

