Global Algorithms for Mean-Variance Optimization in Markov Decision Processes

Li Xia
Li Xia
[email protected]
https://orcid.org/0000-0001-9141-2569
School of Business, Sun Yat-sen University, Guangzhou 510275, China
Search for more papers by this author
,
Shuai Ma
Corresponding Author
Shuai Ma
[email protected]
https://orcid.org/0000-0003-3452-1739
School of Business, Sun Yat-sen University, Guangzhou 510275, China
Search for more papers by this author

School of Business, Sun Yat-sen University, Guangzhou 510275, China

Search for more papers by this author

Shuai Ma

Corresponding Author

Shuai Ma

[email protected]

https://orcid.org/0000-0003-3452-1739

School of Business, Sun Yat-sen University, Guangzhou 510275, China

Search for more papers by this author

Published Online:25 Feb 2025https://doi.org/10.1287/moor.2023.0176

References

[1] Bisi L, Sabbioni L, Vittori E, Papini M, Restelli M (2020) Risk-averse trust region optimization for reward-volatility reduction. Bessiere C, ed. IJCAI’20 Proc. 29th Internat. Joint Conf. Artificial Intelligence (International Joint Conferences on Artificial Intelligence Organization, Yokohama, Japan), 4583–4589.Google Scholar
[2] Borkar V (2010) Learning algorithms for risk-sensitive control. Edelmayer A, ed. Proc. 19th Internat. Sympos. Math. Theory Networks Systems (MTNS’2010) (International Joint Conferences on Artificial Intelligence Organization, Budapest), 1327–1332.Google Scholar
[3] Cao XR (2007) Stochastic Learning and Optimization: A Sensitivity-Based Approach (Springer, New York).Crossref, Google Scholar
[4] Chung KJ (1994) Mean-variance tradeoffs in an undiscounted MDP: The unichain case. Oper. Res. 42(1):184–188.Link, Google Scholar
[5] Cui XY, Gao J, Li X, Shi Y (2022) Survey on multi-period mean-variance portfolio selection model. J. Oper. Res. Soc. China 10:599–622.Crossref, Google Scholar
[6] Dai M, Jin H, Kou S, Xu Y (2021) A dynamic mean-variance analysis for log returns. Management Sci. 67(2):1093–1108.Link, Google Scholar
[7] Filar JA, Lee HM (1985) Gain/variability tradeoffs in undiscounted Markov decision processes. Proc. 24th IEEE Conf. Decision Control (CDC’1985) (IEEE, Piscataway, NJ), 1106–1112.Google Scholar
[8] Gal T, Greenberg HJ, eds. (1997) Advances in Sensitivity Analysis and Parametric Programming (Springer, New York).Crossref, Google Scholar
[9] Gao J, Li D (2013) Optimal cardinality constrained portfolio selection. Oper. Res. 61(3):745–761.Link, Google Scholar
[10] Guo X, Song XY (2009) Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automatic Control 54(9):2151–2157.Crossref, Google Scholar
[11] Guo X, Ye L, Yin G (2012) A mean-variance optimization problem for discounted Markov decision processes. Eur. J. Oper. Res. 220(2):423–429.Crossref, Google Scholar
[12] Haskell WB, Jain R (2013) Stochastic dominance-constrained Markov decision processes. SIAM J. Control Optim. 51(1):273–303.Crossref, Google Scholar
[13] Hernández-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optim. 38(1):79–93.Crossref, Google Scholar
[14] Huang Y (2018) Finite horizon continuous-time Markov decision processes with mean and variance criteria. Discrete Event Dynamic Systems 28(4):539–564.Crossref, Google Scholar
[15] Li D, Ng WL (2000) Optimal dynamic portfolio selection: Multiperiod mean-variance formulation. Math. Finance 10(3):387–406.Crossref, Google Scholar
[16] Ma S, Ma X, Xia L (2023) A unified algorithm framework for mean-variance optimization in discounted Markov decision processes. Eur. J. Oper. Res. 311(3):1057–1067.Crossref, Google Scholar
[17] Markowitz H (1952) Portfolio selection. J. Finance 7(1):77–91.Google Scholar
[18] Prashanth LA, Ghavamzadeh M (2013) Actor-critic algorithms for risk-sensitive MDPs. NIPS’13 Proc. 27th Internat. Conf. Neural Inform. Processing Systems, vol. 1 (Curran Associates Inc., Red Hook, NY), 252–260.Google Scholar
[19] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
[20] Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.Crossref, Google Scholar
[21] Sobel MJ (1982) The variance of discounted Markov decision processes. J. Appl. Probab. 19(4):794–802.Crossref, Google Scholar
[22] Sobel MJ (1994) Mean-variance tradeoffs in an undiscounted MDP. Oper. Res. 42(1):175–183.Link, Google Scholar
[23] Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
[24] Tamar A, Castro DD, Mannor S (2012) Policy gradients with variance related risk criteria. Proc. 29th Internat. Conf. Machine Learning (ICML’2012) (Omnipress, Madison, WI), 387–396.Google Scholar
[25] Tan CH, Hartman JC (2011) Sensitivity analysis in Markov decision processes with uncertain reward parameters. J. Appl. Probab. 48(4):954–967.Crossref, Google Scholar
[26] Xia L (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73:269–278.Crossref, Google Scholar
[27] Xia L (2018) Mean-variance optimization of discrete time discounted Markov decision processes. Automatica 88:76–82.Crossref, Google Scholar
[28] Xia L (2020) Risk-sensitive Markov decision processes with combined metrics of mean and variance. Production Oper. Management 29(12):2808–2827.Crossref, Google Scholar
[29] Xia L, Glynn PW (2016) A generalized fundamental matrix for computing fundamental quantities of Markov systems. Preprint, submitted April 15, https://arxiv.org/abs/1604.04343.Google Scholar
[30] Xie T, Liu B, Xu Y, Ghavamzadeh M, Chow Y, Lyu D, Yoon D (2018) A block coordinate ascent algorithm for mean-variance optimization. NIPS’18 Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1065–1075.Google Scholar
[31] Zhang S, Liu B, Whiteson S (2021) Mean-variance policy iteration for risk-averse reinforcement learning. Proc. AAAI Conf. Artificial Intelligence 35(12):10905–10913.Crossref, Google Scholar
[32] Zhou XY, Li D (2000) Continuous-time mean-variance portfolio selection: A stochastic LQ framework. Appl. Math. Optim. 42:19–33.Crossref, Google Scholar
[33] Zhou XY, Yin G (2004) Markowitz’s mean-variance portfolio selection with regime switching: A continuous-time model. SIAM J. Control Optim. 42(4):1466–1482.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 51, Issue 1

February 2026

Pages iv-viii, 1-851

Article Information

Metrics

Information

Received:June 06, 2023
Accepted:January 25, 2025
Published Online:February 25, 2025

Cite as

Li Xia, Shuai Ma (2025) Global Algorithms for Mean-Variance Optimization in Markov Decision Processes. Mathematics of Operations Research 51(1):440-455.

https://doi.org/10.1287/moor.2023.0176

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Global Algorithms for Mean-Variance Optimization in Markov Decision Processes

References

Volume 51, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News