Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes

Dmitry Krass
Dmitry Krass
[email protected]
Rotman School of Management, University of Toronto, 105 St. George Street, Toronto, Canada M5S 3E6
Search for more papers by this author
,
O. J. Vrieze
O. J. Vrieze
[email protected]
Maastricht University, Department of Mathematics, P.O. Box 616, 6200 MD Maastricht, The Netherlands
Search for more papers by this author

Dmitry Krass

[email protected]

Rotman School of Management, University of Toronto, 105 St. George Street, Toronto, Canada M5S 3E6

Search for more papers by this author

O. J. Vrieze

[email protected]

Maastricht University, Department of Mathematics, P.O. Box 616, 6200 MD Maastricht, The Netherlands

Search for more papers by this author

Published Online:1 Aug 2002https://doi.org/10.1287/moor.27.3.545.316

Abstract

In this paper we address a basic problem that arises naturally in average-reward Markov decision processes with constraints and/or nonstandard payoff criteria: Given a feasible state-action frequency vector (“the target”), construct a policy whose state-action frequencies match those of the target vector.

While it is well known that the solution to this problem cannot, in general, be found in the space of stationary randomized policies, we construct a solution that has “ultimately stationary” structure: It consists of two stationary policies where the first one is used initially, and then the switch to the second one is made at a certain random switching time. The computational effort required to construct this solution is minimal.

We also show that our problem can always be solved by a stationary policy if the original MDP is “extended” by adding certain states and actions. The solution in the original MDP is obtained by mapping the solution in the extended MDP back to the original process.

cover image Mathematics of Operations Research

Volume 27, Issue 3

August 2002

Pages 445-635

Article Information

Metrics

Information

Received:March 14, 1996
Published Online:August 01, 2002

Cite as

Dmitry Krass, O. J. Vrieze, (2002) Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes. Mathematics of Operations Research 27(3):545-566.

https://doi.org/10.1287/moor.27.3.545.316

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Achieving Target State-Action Frequencies in Multichain Average-Reward Markov Decision Processes

Abstract

Volume 27, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News