Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards

Rommert Dekker
Rommert Dekker
Koninklijke/Shell Laboratorium Amsterdam, P.O. Box 3003, Amsterdam, The Netherlands
Search for more papers by this author
,
Arie Hordijk
Arie Hordijk
Institute of Applied Mathematics and Computer Science, University of Leiden, Leiden, The Netherlands
Search for more papers by this author

Rommert Dekker

Koninklijke/Shell Laboratorium Amsterdam, P.O. Box 3003, Amsterdam, The Netherlands

Search for more papers by this author

Arie Hordijk

Institute of Applied Mathematics and Computer Science, University of Leiden, Leiden, The Netherlands

Search for more papers by this author

Published Online:1 Aug 1988https://doi.org/10.1287/moor.13.3.395

Abstract

In this paper we consider a (discrete-time) Markov decision chain with a denumerable state space and compact action sets and we assume that for all states the rewards and transition probabilities depend continuously on the actions.

The first objective of this paper is to develop an analysis for average optimality without assuming a special Markov chain structure. In doing so, we present a set of conditions guaranteeing average optimality, which are automatically fulfilled in the finite state and action model.

The second objective is to study simultaneously average and discount optimality as Veinott (Veinott, A. F., Jr. 1969. On discrete dynamic programming with sensitive discount optimality criteria. Ann. Math. Statist.40 1635–1660.) did for the finite state and action model. We investigate the concepts of n-discount and Blackwell optimality in the denumerable state space, using a Laurent series expansion for the discounted rewards. Under the same condition as for average optimality, we establish solutions to the n-discount optimality equations for every n.

cover image Mathematics of Operations Research

Volume 13, Issue 3

August 1988

Pages 377-534

Article Information

Metrics

Information

Published Online:August 01, 1988

Cite as

Rommert Dekker, Arie Hordijk, (1988) Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards. Mathematics of Operations Research 13(3):395-420.

https://doi.org/10.1287/moor.13.3.395

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Average, Sensitive and Blackwell Optimal Policies in Denumerable Markov Decision Chains with Unbounded Rewards

Abstract

Volume 13, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News