Global Algorithms for Mean-Variance Optimization in Markov Decision Processes

Published Online:https://doi.org/10.1287/moor.2023.0176

References

  • [1] Bisi L, Sabbioni L, Vittori E, Papini M, Restelli M (2020) Risk-averse trust region optimization for reward-volatility reduction. Bessiere C, ed. IJCAI’20 Proc. 29th Internat. Joint Conf. Artificial Intelligence (International Joint Conferences on Artificial Intelligence Organization, Yokohama, Japan), 4583–4589.Google Scholar
  • [2] Borkar V (2010) Learning algorithms for risk-sensitive control. Edelmayer A, ed. Proc. 19th Internat. Sympos. Math. Theory Networks Systems (MTNS’2010) (International Joint Conferences on Artificial Intelligence Organization, Budapest), 1327–1332.Google Scholar
  • [3] Cao XR (2007) Stochastic Learning and Optimization: A Sensitivity-Based Approach (Springer, New York).CrossrefGoogle Scholar
  • [4] Chung KJ (1994) Mean-variance tradeoffs in an undiscounted MDP: The unichain case. Oper. Res. 42(1):184–188.LinkGoogle Scholar
  • [5] Cui XY, Gao J, Li X, Shi Y (2022) Survey on multi-period mean-variance portfolio selection model. J. Oper. Res. Soc. China 10:599–622.CrossrefGoogle Scholar
  • [6] Dai M, Jin H, Kou S, Xu Y (2021) A dynamic mean-variance analysis for log returns. Management Sci. 67(2):1093–1108.LinkGoogle Scholar
  • [7] Filar JA, Lee HM (1985) Gain/variability tradeoffs in undiscounted Markov decision processes. Proc. 24th IEEE Conf. Decision Control (CDC’1985) (IEEE, Piscataway, NJ), 1106–1112.Google Scholar
  • [8] Gal T, Greenberg HJ, eds. (1997) Advances in Sensitivity Analysis and Parametric Programming (Springer, New York).CrossrefGoogle Scholar
  • [9] Gao J, Li D (2013) Optimal cardinality constrained portfolio selection. Oper. Res. 61(3):745–761.LinkGoogle Scholar
  • [10] Guo X, Song XY (2009) Mean-variance criteria for finite continuous-time Markov decision processes. IEEE Trans. Automatic Control 54(9):2151–2157.CrossrefGoogle Scholar
  • [11] Guo X, Ye L, Yin G (2012) A mean-variance optimization problem for discounted Markov decision processes. Eur. J. Oper. Res. 220(2):423–429.CrossrefGoogle Scholar
  • [12] Haskell WB, Jain R (2013) Stochastic dominance-constrained Markov decision processes. SIAM J. Control Optim. 51(1):273–303.CrossrefGoogle Scholar
  • [13] Hernández-Lerma O, Vega-Amaya O, Carrasco G (1999) Sample-path optimality and variance-minimization of average cost Markov control processes. SIAM J. Control Optim. 38(1):79–93.CrossrefGoogle Scholar
  • [14] Huang Y (2018) Finite horizon continuous-time Markov decision processes with mean and variance criteria. Discrete Event Dynamic Systems 28(4):539–564.CrossrefGoogle Scholar
  • [15] Li D, Ng WL (2000) Optimal dynamic portfolio selection: Multiperiod mean-variance formulation. Math. Finance 10(3):387–406.CrossrefGoogle Scholar
  • [16] Ma S, Ma X, Xia L (2023) A unified algorithm framework for mean-variance optimization in discounted Markov decision processes. Eur. J. Oper. Res. 311(3):1057–1067.CrossrefGoogle Scholar
  • [17] Markowitz H (1952) Portfolio selection. J. Finance 7(1):77–91.Google Scholar
  • [18] Prashanth LA, Ghavamzadeh M (2013) Actor-critic algorithms for risk-sensitive MDPs. NIPS’13 Proc. 27th Internat. Conf. Neural Inform. Processing Systems, vol. 1 (Curran Associates Inc., Red Hook, NY), 252–260.Google Scholar
  • [19] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • [20] Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–42.CrossrefGoogle Scholar
  • [21] Sobel MJ (1982) The variance of discounted Markov decision processes. J. Appl. Probab. 19(4):794–802.CrossrefGoogle Scholar
  • [22] Sobel MJ (1994) Mean-variance tradeoffs in an undiscounted MDP. Oper. Res. 42(1):175–183.LinkGoogle Scholar
  • [23] Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction, 2nd ed. (MIT Press, Cambridge, MA).Google Scholar
  • [24] Tamar A, Castro DD, Mannor S (2012) Policy gradients with variance related risk criteria. Proc. 29th Internat. Conf. Machine Learning (ICML’2012) (Omnipress, Madison, WI), 387–396.Google Scholar
  • [25] Tan CH, Hartman JC (2011) Sensitivity analysis in Markov decision processes with uncertain reward parameters. J. Appl. Probab. 48(4):954–967.CrossrefGoogle Scholar
  • [26] Xia L (2016) Optimization of Markov decision processes under the variance criterion. Automatica 73:269–278.CrossrefGoogle Scholar
  • [27] Xia L (2018) Mean-variance optimization of discrete time discounted Markov decision processes. Automatica 88:76–82.CrossrefGoogle Scholar
  • [28] Xia L (2020) Risk-sensitive Markov decision processes with combined metrics of mean and variance. Production Oper. Management 29(12):2808–2827.CrossrefGoogle Scholar
  • [29] Xia L, Glynn PW (2016) A generalized fundamental matrix for computing fundamental quantities of Markov systems. Preprint, submitted April 15, https://arxiv.org/abs/1604.04343.Google Scholar
  • [30] Xie T, Liu B, Xu Y, Ghavamzadeh M, Chow Y, Lyu D, Yoon D (2018) A block coordinate ascent algorithm for mean-variance optimization. NIPS’18 Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1065–1075.Google Scholar
  • [31] Zhang S, Liu B, Whiteson S (2021) Mean-variance policy iteration for risk-averse reinforcement learning. Proc. AAAI Conf. Artificial Intelligence 35(12):10905–10913.CrossrefGoogle Scholar
  • [32] Zhou XY, Li D (2000) Continuous-time mean-variance portfolio selection: A stochastic LQ framework. Appl. Math. Optim. 42:19–33.CrossrefGoogle Scholar
  • [33] Zhou XY, Yin G (2004) Markowitz’s mean-variance portfolio selection with regime switching: A continuous-time model. SIAM J. Control Optim. 42(4):1466–1482.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.