Global Algorithms for Mean-Variance Optimization in Markov Decision Processes
Published Online:25 Feb 2025https://doi.org/10.1287/moor.2023.0176
References
- [1] (2020) Risk-averse trust region optimization for reward-volatility reduction. Bessiere C, ed. IJCAI’20 Proc. 29th Internat. Joint Conf. Artificial Intelligence (International Joint Conferences on Artificial Intelligence Organization, Yokohama, Japan), 4583–4589.Google Scholar
- [2] (2010) Learning algorithms for risk-sensitive control. Edelmayer A, ed. Proc. 19th Internat. Sympos. Math. Theory Networks Systems (MTNS’2010) (International Joint Conferences on Artificial Intelligence Organization, Budapest), 1327–1332.Google Scholar
- [3] (2007) Stochastic Learning and Optimization: A Sensitivity-Based Approach (Springer, New York).Crossref, Google Scholar
- [4] (1994) Mean-variance tradeoffs in an undiscounted MDP: The unichain case. Oper. Res. 42(1):184–188.Link, Google Scholar
- [5] (2022) Survey on multi-period mean-variance portfolio selection model. J. Oper. Res. Soc. China 10:599–622.Crossref, Google Scholar
- [6] (2021) A dynamic mean-variance analysis for log returns. Management Sci. 67(2):1093–1108.Link, Google Scholar
- [7] (1985) Gain/variability tradeoffs in undiscounted Markov decision processes. Proc. 24th IEEE Conf. Decision Control (CDC’1985) (IEEE, Piscataway, NJ), 1106–1112.Google Scholar
- [8] Gal T, Greenberg HJ, eds. (1997) Advances in Sensitivity Analysis and Parametric Programming (Springer, New York).Crossref, Google Scholar
- [9] (2013) Optimal cardinality constrained portfolio selection. Oper. Res. 61(3):745–761.Link, Google Scholar
- [10] (2009) Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automatic Control 54(9):2151–2157.Crossref, Google Scholar
- [11] (2012) A mean-variance optimization problem for discounted Markov decision processes. Eur. J. Oper. Res. 220(2):423–429.Crossref, Google Scholar
- [12] (2013) Stochastic dominance-constrained Markov decision processes. SIAM J. Control Optim. 51(1):273–303.Crossref, Google Scholar
- [13] (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optim. 38(1):79–93.Crossref, Google Scholar
- [14] (2018) Finite horizon continuous-time Markov decision processes with mean and variance criteria. Discrete Event Dynamic Systems 28(4):539–564.Crossref, Google Scholar
- [15] (2000) Optimal dynamic portfolio selection: Multiperiod mean-variance formulation. Math. Finance 10(3):387–406.Crossref, Google Scholar
- [16] (2023) A unified algorithm framework for mean-variance optimization in discounted Markov decision processes. Eur. J. Oper. Res. 311(3):1057–1067.Crossref, Google Scholar
- [17] (1952) Portfolio selection. J. Finance 7(1):77–91.Google Scholar
- [18] (2013) Actor-critic algorithms for risk-sensitive MDPs. NIPS’13 Proc. 27th Internat. Conf. Neural Inform. Processing Systems, vol. 1 (Curran Associates Inc., Red Hook, NY), 252–260.Google Scholar
- [19] (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
- [20] (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.Crossref, Google Scholar
- [21] (1982) The variance of discounted Markov decision processes. J. Appl. Probab. 19(4):794–802.Crossref, Google Scholar
- [22] (1994) Mean-variance tradeoffs in an undiscounted MDP. Oper. Res. 42(1):175–183.Link, Google Scholar
- [23] (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
- [24] (2012) Policy gradients with variance related risk criteria. Proc. 29th Internat. Conf. Machine Learning (ICML’2012) (Omnipress, Madison, WI), 387–396.Google Scholar
- [25] (2011) Sensitivity analysis in Markov decision processes with uncertain reward parameters. J. Appl. Probab. 48(4):954–967.Crossref, Google Scholar
- [26] (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73:269–278.Crossref, Google Scholar
- [27] (2018) Mean-variance optimization of discrete time discounted Markov decision processes. Automatica 88:76–82.Crossref, Google Scholar
- [28] (2020) Risk-sensitive Markov decision processes with combined metrics of mean and variance. Production Oper. Management 29(12):2808–2827.Crossref, Google Scholar
- [29] (2016) A generalized fundamental matrix for computing fundamental quantities of Markov systems. Preprint, submitted April 15, https://arxiv.org/abs/1604.04343.Google Scholar
- [30] (2018) A block coordinate ascent algorithm for mean-variance optimization. NIPS’18 Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1065–1075.Google Scholar
- [31] (2021) Mean-variance policy iteration for risk-averse reinforcement learning. Proc. AAAI Conf. Artificial Intelligence 35(12):10905–10913.Crossref, Google Scholar
- [32] (2000) Continuous-time mean-variance portfolio selection: A stochastic LQ framework. Appl. Math. Optim. 42:19–33.Crossref, Google Scholar
- [33] (2004) Markowitz’s mean-variance portfolio selection with regime switching: A continuous-time model. SIAM J. Control Optim. 42(4):1466–1482.Crossref, Google Scholar

