Computing a Bias-Optimal Policy in a Discrete-Time Markov Decision Problem

Published Online:https://doi.org/10.1287/opre.18.2.279

This paper treats a discrete-time Markov decision model with an infinite planning horizon and no discounting. A “bias-optimal” policy for this decision problem satisfies a criterion that is more selective than maximizing the gain rate. The problem of computing a bias-optimal policy, also treated by Veinott in 1966, is here parsed into a sequence of three simple Markov decision problems, each of which can be solved by linear programming or policy iteration.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.