Existence of a Stationary Control for a Markov Chain Maximizing the Average Reward

Published Online:https://doi.org/10.1287/opre.15.5.866

The problem of optimal control of a discrete time stationary Markov chain with complete state information has been considered by many authors. The case with finitely many states and controls has been thoroughly investigated. Chains with infinitely many states or controls have also been considered with various assumptions concerning the reward function. In this paper the existence of a control maximizing the average reward is established for Markov chains with a finite number of states and an arbitrary compact set of possible actions in each state. It is assumed that there is only one ergodic class and no transient states in the chain for every control. The method of proof uses methods from convex programming, and is analogous to the linear programming approach used by Wolfe and Danzig.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.