On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs

Ying Huang
Ying Huang
Manufacturing Systems Research, Philips Laboratories, 345 Scarborough Road, Briarcliff Manor, NY 10510
Search for more papers by this author
,
L. C. M. Kallenberg
L. C. M. Kallenberg
University of Leiden, The Netherlands
Search for more papers by this author

Ying Huang

Manufacturing Systems Research, Philips Laboratories, 345 Scarborough Road, Briarcliff Manor, NY 10510

Search for more papers by this author

L. C. M. Kallenberg

University of Leiden, The Netherlands

Search for more papers by this author

Published Online:1 May 1994https://doi.org/10.1287/moor.19.2.434

Abstract

This paper proves constructively the existence of optimal policies for maximum one-period mean-to-standard-deviation-ratio, negative variance-with-bounded-mean and mean-penalized-by-variance Markov decision chains by reducing them to a related mathematical program. This program entails maximizing (xB/D(xb)) + C(xb) over x in a polytope and with given bounds on xb where C and D are convex and either D is constant or D is positive and nondecreasing, C is nondecreasing and xB is nonpositive. This program is in turn reduced to maximizing x(B + θb) over x in the polytope parametrically in θ. Along the way, under the nonnegative-initial-distribution assumption, we generalize the rule of constructing a stationary maximum-average-reward policy from an extreme optimal solution of the associated linear program. The paper unifies and extends formulations and existence results for problems discussed by White (1987), Filar and Lee (1985), Sobel (1985), Kawai (1987) and Filar, Kallenberg and Lee (1989), and gives an effective computational procedure to solve them that is related to a method used by Kawai (1987) in a special case.

cover image Mathematics of Operations Research

Volume 19, Issue 2

May 1994

Pages 257-512

Article Information

Metrics

Information

Published Online:May 01, 1994

Cite as

Ying Huang, L. C. M. Kallenberg, (1994) On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs. Mathematics of Operations Research 19(2):434-448.

https://doi.org/10.1287/moor.19.2.434

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On Finding Optimal Policies for Markov Decision Chains: A Unifying Framework for Mean-Variance-Tradeoffs

Abstract

Volume 19, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News