Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards

Published Online:https://doi.org/10.1287/moor.13.3.395

In this paper we consider a (discrete-time) Markov decision chain with a denumerable state space and compact action sets and we assume that for all states the rewards and transition probabilities depend continuously on the actions.

The first objective of this paper is to develop an analysis for average optimality without assuming a special Markov chain structure. In doing so, we present a set of conditions guaranteeing average optimality, which are automatically fulfilled in the finite state and action model.

The second objective is to study simultaneously average and discount optimality as Veinott (Veinott, A. F., Jr. 1969. On discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist.40 1635–1660.) did for the finite state and action model. We investigate the concepts of n-discount and Blackwell optimality in the denumerable state space, using a Laurent series expansion for the discounted rewards. Under the same condition as for average optimality, we establish solutions to the n-discount optimality equations for every n.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.