On Stationary Strategies in Borel Dynamic Programming

Published Online:https://doi.org/10.1287/moor.17.2.392

We consider a discrete time Markov decision model with Borel state and action spaces. It is proved that the supremum of the expected total rewards under all randomized stationary strategies is equal to the supremum of these rewards under all (nonrandomized) stationary strategies.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.