Mean-Variance Tradeoffs in an Undiscounted MDP

Matthew J. Sobel
Matthew J. Sobel
State University of New York at Stony Brook, Stony Brook, New York
Search for more papers by this author

State University of New York at Stony Brook, Stony Brook, New York

Published Online:1 Feb 1994https://doi.org/10.1287/opre.42.1.175

Abstract

A stationary policy and an initial state in an MDP (Markov decision process) induce a stationary probability distribution of the reward. The problem analyzed here is generating the Pareto optima in the sense of high mean and low variance of the stationary distribution. In the unichain case, Pareto optima can be computed either with policy improvement or with a linear program having the same number of variables and one more constraint than the formulation for gain-rate optimization. The same linear program suffices in the multichain case if the ergodic class is an element of choice.

Volume 42, Issue 1

January-February 1994

Pages 2-196

Article Information

Metrics

Information

Published Online:February 01, 1994

Cite as

Matthew J. Sobel, (1994) Mean-Variance Tradeoffs in an Undiscounted MDP. Operations Research 42(1):175-183.

https://doi.org/10.1287/opre.42.1.175

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Mean-Variance Tradeoffs in an Undiscounted MDP

Abstract

Volume 42, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News