An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

Arie Leizarowitz
Arie Leizarowitz
[email protected]
Department of Mathematics, Technion, Technion City, Haifa 32000, Israel
Search for more papers by this author

Arie Leizarowitz

[email protected]

Department of Mathematics, Technion, Technion City, Haifa 32000, Israel

Search for more papers by this author

Published Online:1 Aug 2003https://doi.org/10.1287/moor.28.3.553.16388

References

Bather J. Optimal decision procedures for finite Markov chains, II: Communicating systems. Adv. Appl. Probab. (1973a) 5:521–540Crossref, Google Scholar
Bather J. Optimal decision procedures for finite Markov chains, III: General convex systems. Adv. Appl. Probab. (1973b) 5:541–553Crossref, Google Scholar
Berman A., Plemmons R. J.Nonnegative Matrices in the Mathematical Sciences (1979) (Academic Press, New York) Google Scholar
Blackwell D. Discrete dynamic programming. Ann. Math. Statist. (1962) 33:719–726Crossref, Google Scholar
Borkar V. S. On minimum cost per unit time control of Markov chains. SIAM J. Control Optim. (1984) 22:965–984Crossref, Google Scholar
Borkar V. S. Control of Markov chains with long-run average cost criterion: The dynamic programming equations. SIAM J. Control Optim. (1989) 27:642–657Crossref, Google Scholar
Denardo E. V., Fox B. Multichain Markov renewal programs. SIAM J. Appl. Math. (1968) 16:468–487Crossref, Google Scholar
Denardo E. V., Rothblum U. G. Overtaking optimality for Markov decision chains. Math. Oper. Res. (1979) 4:144–152Link, Google Scholar
Derman C.Finite States Markovian Decision Processes (1970) (Academic Press, New York) Google Scholar
Dynkin E. B., Yushkevich A. A.Controlled Markov Processes (1979) (Springer-Verlag, New York) Crossref, Google Scholar
Federgruën A., Schweitzer J. P., Puterman M. L. Discounted and undiscounted value-iteration in Markov decision problems. A Survey in Dynamic Programming and Its Applications (1978) (Academic Press, New York) 23–52Crossref, Google Scholar
Federgruën A., Schweitzer J. P., Hartley R., Thomas L. C., White D. J. A survey of asymptotic value-iteration for undiscounted Markovian decision processes. Recent Developments in Markov Decision Processes (1980) (Academic Press, New York) 73–109Google Scholar
Federgruën A., Schweitzer J. P. A fixed-point approach to undiscounted Markov renewal programs. SIAM J. Algebraic Discrete Methods (1984) 5:539–550Crossref, Google Scholar
Federgruën A., Schweitzer P. J., Tijms H. C. Denumerable undiscounted semi-Markov decision processes with unbounded rewards. Math. Oper. Res. (1983) 8:298–313Link, Google Scholar
Federgruën A., Tijms H. C. The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms. J. Appl. Probab. (1978) 15:356–373Crossref, Google Scholar
Feinberg E. A. On controlled finite state Markov processes with compact control sets. Theor. Probab. Appl. (1975) 20:856–861Crossref, Google Scholar
Feinberg E. A. On stationary strategies Borel dynamic programming. Math. Oper. Res. (1992) 17:392–397Link, Google Scholar
Filar J., Vrieze K.Competitive Markov Decision Processes (1997) (Springer-Verlag, New York) Google Scholar
Haviv M., Puterman M. L. An improved algorithm for solving communicating average reward Markov decision processes. Ann. Oper. Res. (1991) 28:229–242Crossref, Google Scholar
Heyman D. P., Sobel M. J.Stochastic Models in Operations Research, Vol. II: Stochastic Optimization (1984) (McGraw-Hill, New York) Google Scholar
Hinderer K.Foundations of Non-Stationary Dynamic Programming with Discrete-Time Parameter (1970) (Springer-Verlag, New York) Lecture Notes in Operations Research 33Crossref, Google Scholar
Hordijk A. Dynamic programming and Markov potential theory. Math. Centre Tracts (1974) 51(Amsterdam)Google Scholar
Hordijk A., Kallenberg L. C. M. Constrained undiscounted stochastic dynamic programming. Math. Oper. Res. (1984) 9:276–289Link, Google Scholar
Hordijk A., Puterman M. L. On the convergence of policy iteration in undiscounted finite state Markov processes: The unichain case. Math. Oper. Res. (1987) 12:163–176Link, Google Scholar
Leizarowitz A. Overtaking and almost-sure optimality for infinite horizon Markov decision processes. Math. Oper. Res. (1996) 21:158–181Link, Google Scholar
Martin-Löf A. Existence of stationary control for a Markov chain maximizing the average reward. Oper. Res. (1967) 15:866–871Link, Google Scholar
Platzman L. K. Improved conditions for convergence in undiscounted Markov renewal programming. Oper. Res. (1977) 25:529–533Link, Google Scholar
Puterman M. L.Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994) (John Wiley and Sons, Inc., New York) Crossref, Google Scholar
Ross K. W., Varadarajan R. Multichain Markov decision processes with a sample-path constraint: A decomposition approach. Math. Oper. Res. (1991) 16:195–217Link, Google Scholar
Rothblum U. G. Multiplicative Markov decison chains. Math. Oper. Res. (1984) 9:6–24Link, Google Scholar
Rothblum U. G., Whittle P. Growth optimality for branching Markov decision chains. Math. Oper. Res. (1982) 7:582–601Link, Google Scholar
Schäl M. Conditions for optimality in dynamic programming and for the limit of n-stage optimal policy to be optimal. Z. Wahrch. verw. Gebiete (1975) 32:179–196Crossref, Google Scholar
Schäl M. On the second optimality equation for semi-Markov decision models. Math. Oper. Res. (1992) 17:470–486Link, Google Scholar
Schweitzer P. J. On the solvability of Bellman's functional equations for Markov renewal programs. J. Math. Anal. Appl. (1983) 96:13–23Crossref, Google Scholar
Schweitzer P. J. A Brouwer fixed-point mapping approach to communicating Markov decision processes. J. Math. Anal. Appl. (1987) 123:117–130Crossref, Google Scholar
Schweitzer P. J., Federgruën A. The functional equations of undiscounted Markov renewal programming. Math. Oper. Res. (1978) 3:308–321Link, Google Scholar
Seneta E.Non-Negative Matrices and Markov Chains (1981) (Springer-Verlag, New York) Crossref, Google Scholar
Strauch R. E. Negative dynamic programming. Ann. Math. Statist. (1966) 37:871–890Crossref, Google Scholar
Yushkevich A. A. On a class of policies in general Markov decision models. Theory Probabil. Appl. (1973) 18:777–779Crossref, Google Scholar
van der Wal J.Stochastic Dynamic Programming (1981) (The Mathematical Centre, Amsterdam, The Netherlands) Google Scholar
Varadarajan R. Reliability and performance models for reconfigurable computer systems. (1987) . Ph.D. thesis, University of Pennsylvania, Philadelphia, PAGoogle Scholar

cover image Mathematics of Operations Research

Volume 28, Issue 3

August 2003

Pages 395-608

Article Information

Metrics

Information

Received:October 06, 1998
Published Online:August 01, 2003

Cite as

Arie Leizarowitz, (2003) An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes. Mathematics of Operations Research 28(3):553-586.

https://doi.org/10.1287/moor.28.3.553.16388

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

References

Volume 28, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News