Constrained Markov Decision Chains

Published Online:https://doi.org/10.1287/mnsc.19.4.389

We consider finite state and action discrete time parameter Markov decision chains. The objective is to provide an algorithm for finding a policy that minimizes the long-run expected average cost when there are linear side conditions on the limit points of the expected state-action frequencies. This problem has been solved previously only for the case where every deterministic stationary policy has at most one ergodic class. This note removes that restriction by applying the Dantzig-Wolfe decomposition principle.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.