Global Algorithms for Mean-Variance Optimization in Markov Decision Processes

Li Xia
Li Xia
[email protected]
https://orcid.org/0000-0001-9141-2569
School of Business, Sun Yat-sen University, Guangzhou 510275, China
Search for more papers by this author
,
Shuai Ma
Corresponding Author
Shuai Ma
[email protected]
https://orcid.org/0000-0003-3452-1739
School of Business, Sun Yat-sen University, Guangzhou 510275, China
Search for more papers by this author

School of Business, Sun Yat-sen University, Guangzhou 510275, China

Search for more papers by this author

Shuai Ma

Corresponding Author

Shuai Ma

[email protected]

https://orcid.org/0000-0003-3452-1739

School of Business, Sun Yat-sen University, Guangzhou 510275, China

Search for more papers by this author

Published Online:25 Feb 2025https://doi.org/10.1287/moor.2023.0176

Abstract

Dynamic optimization of mean and variance in Markov decision processes (MDPs) is a long-standing challenge caused by the failure of dynamic programming. In this paper, we propose a new approach to finding the globally optimal policy for combined metrics of steady-state mean and variance in an infinite-horizon undiscounted MDP. By introducing the concepts of pseudo mean and pseudo variance, we convert the original problem to a bilevel MDP problem, where the inner one is a standard MDP optimizing pseudo mean-variance, and the outer one is a single-parameter selection problem optimizing pseudo mean. We use the sensitivity analysis of MDPs to derive the properties of this bilevel problem. By solving inner standard MDPs for pseudo mean-variance optimization, we can identify worse policy spaces dominated by optimal policies of the pseudo problems. We propose an optimization algorithm that can find the globally optimal policy by repeatedly removing worse policy spaces. The convergence and complexity of the algorithm are studied. Another policy dominance property is also proposed to further improve the algorithm efficiency. Numerical experiments demonstrate the performance and efficiency of our algorithms. To the best of our knowledge, our algorithm is the first that efficiently finds the globally optimal policy of mean-variance optimization in MDPs. Our results are also valid for solely minimizing the variance metrics and can shed light on solving other varied forms of mean-variance MDPs.

Funding: This research was supported in part by the National Key Research and Development Program of China [Grant 2022YFA1004600], the National Natural Science Foundation of China [Grants 72342006 and 72371253], the Guangdong Basic and Applied Basic Research Foundation [Grants 2023A1515012492 and 2023B1515040001], the Regional Joint Foundation of Guangdong [Grant 2022A1515110725], and the Guangdong Province Key Laboratory of Computational Science at the Sun Yat-sen University. The work described in this paper was also partially supported by the InnoHK Initiative, the Government of the HKSAR, and the Laboratory for AI-Powered Financial Technologies.

cover image Mathematics of Operations Research

Volume 51, Issue 1

February 2026

Pages iv-viii, 1-851

Article Information

Metrics

Information

Received:June 06, 2023
Accepted:January 25, 2025
Published Online:February 25, 2025

Cite as

Li Xia, Shuai Ma (2025) Global Algorithms for Mean-Variance Optimization in Markov Decision Processes. Mathematics of Operations Research 51(1):440-455.

https://doi.org/10.1287/moor.2023.0176

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Global Algorithms for Mean-Variance Optimization in Markov Decision Processes

Abstract

Volume 51, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News