An Algorithm to Identify and Compute Average Optimal Policies in Multichain Markov Decision Processes

References

  • Bather J. Optimal decision procedures for finite Markov chains, II: Communicating systems. Adv. Appl. Probab. (1973a) 5:521–540CrossrefGoogle Scholar
  • Bather J. Optimal decision procedures for finite Markov chains, III: General convex systems. Adv. Appl. Probab. (1973b) 5:541–553CrossrefGoogle Scholar
  • Berman A., Plemmons R. J.Nonnegative Matrices in the Mathematical Sciences (1979) (Academic Press, New York) Google Scholar
  • Blackwell D. Discrete dynamic programming. Ann. Math. Statist. (1962) 33:719–726CrossrefGoogle Scholar
  • Borkar V. S. On minimum cost per unit time control of Markov chains. SIAM J. Control Optim. (1984) 22:965–984CrossrefGoogle Scholar
  • Borkar V. S. Control of Markov chains with long-run average cost criterion: The dynamic programming equations. SIAM J. Control Optim. (1989) 27:642–657CrossrefGoogle Scholar
  • Denardo E. V., Fox B. Multichain Markov renewal programs. SIAM J. Appl. Math. (1968) 16:468–487CrossrefGoogle Scholar
  • Denardo E. V., Rothblum U. G. Overtaking optimality for Markov decision chains. Math. Oper. Res. (1979) 4:144–152LinkGoogle Scholar
  • Derman C.Finite States Markovian Decision Processes (1970) (Academic Press, New York) Google Scholar
  • Dynkin E. B., Yushkevich A. A.Controlled Markov Processes (1979) (Springer-Verlag, New York) CrossrefGoogle Scholar
  • Federgruën A., Schweitzer J. P., Puterman M. L. Discounted and undiscounted value-iteration in Markov decision problems. A Survey in Dynamic Programming and Its Applications (1978) (Academic Press, New York) 23–52CrossrefGoogle Scholar
  • Federgruën A., Schweitzer J. P., Hartley R., Thomas L. C., White D. J. A survey of asymptotic value-iteration for undiscounted Markovian decision processes. Recent Developments in Markov Decision Processes (1980) (Academic Press, New York) 73–109Google Scholar
  • Federgruën A., Schweitzer J. P. A fixed-point approach to undiscounted Markov renewal programs. SIAM J. Algebraic Discrete Methods (1984) 5:539–550CrossrefGoogle Scholar
  • Federgruën A., Schweitzer P. J., Tijms H. C. Denumerable undiscounted semi-Markov decision processes with unbounded rewards. Math. Oper. Res. (1983) 8:298–313LinkGoogle Scholar
  • Federgruën A., Tijms H. C. The optimality equation in average cost denumerable state semi-Markov decision problems, recurrency conditions and algorithms. J. Appl. Probab. (1978) 15:356–373CrossrefGoogle Scholar
  • Feinberg E. A. On controlled finite state Markov processes with compact control sets. Theor. Probab. Appl. (1975) 20:856–861CrossrefGoogle Scholar
  • Feinberg E. A. On stationary strategies Borel dynamic programming. Math. Oper. Res. (1992) 17:392–397LinkGoogle Scholar
  • Filar J., Vrieze K.Competitive Markov Decision Processes (1997) (Springer-Verlag, New York) Google Scholar
  • Haviv M., Puterman M. L. An improved algorithm for solving communicating average reward Markov decision processes. Ann. Oper. Res. (1991) 28:229–242CrossrefGoogle Scholar
  • Heyman D. P., Sobel M. J.Stochastic Models in Operations Research, Vol. II: Stochastic Optimization (1984) (McGraw-Hill, New York) Google Scholar
  • Hinderer K.Foundations of Non-Stationary Dynamic Programming with Discrete-Time Parameter (1970) (Springer-Verlag, New York) Lecture Notes in Operations Research 33CrossrefGoogle Scholar
  • Hordijk A. Dynamic programming and Markov potential theory. Math. Centre Tracts (1974) 51(Amsterdam)Google Scholar
  • Hordijk A., Kallenberg L. C. M. Constrained undiscounted stochastic dynamic programming. Math. Oper. Res. (1984) 9:276–289LinkGoogle Scholar
  • Hordijk A., Puterman M. L. On the convergence of policy iteration in undiscounted finite state Markov processes: The unichain case. Math. Oper. Res. (1987) 12:163–176LinkGoogle Scholar
  • Leizarowitz A. Overtaking and almost-sure optimality for infinite horizon Markov decision processes. Math. Oper. Res. (1996) 21:158–181LinkGoogle Scholar
  • Martin-Löf A. Existence of stationary control for a Markov chain maximizing the average reward. Oper. Res. (1967) 15:866–871LinkGoogle Scholar
  • Platzman L. K. Improved conditions for convergence in undiscounted Markov renewal programming. Oper. Res. (1977) 25:529–533LinkGoogle Scholar
  • Puterman M. L.Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994) (John Wiley and Sons, Inc., New York) CrossrefGoogle Scholar
  • Ross K. W., Varadarajan R. Multichain Markov decision processes with a sample-path constraint: A decomposition approach. Math. Oper. Res. (1991) 16:195–217LinkGoogle Scholar
  • Rothblum U. G. Multiplicative Markov decison chains. Math. Oper. Res. (1984) 9:6–24LinkGoogle Scholar
  • Rothblum U. G., Whittle P. Growth optimality for branching Markov decision chains. Math. Oper. Res. (1982) 7:582–601LinkGoogle Scholar
  • Schäl M. Conditions for optimality in dynamic programming and for the limit of n-stage optimal policy to be optimal. Z. Wahrch. verw. Gebiete (1975) 32:179–196CrossrefGoogle Scholar
  • Schäl M. On the second optimality equation for semi-Markov decision models. Math. Oper. Res. (1992) 17:470–486LinkGoogle Scholar
  • Schweitzer P. J. On the solvability of Bellman's functional equations for Markov renewal programs. J. Math. Anal. Appl. (1983) 96:13–23CrossrefGoogle Scholar
  • Schweitzer P. J. A Brouwer fixed-point mapping approach to communicating Markov decision processes. J. Math. Anal. Appl. (1987) 123:117–130CrossrefGoogle Scholar
  • Schweitzer P. J., Federgruën A. The functional equations of undiscounted Markov renewal programming. Math. Oper. Res. (1978) 3:308–321LinkGoogle Scholar
  • Seneta E.Non-Negative Matrices and Markov Chains (1981) (Springer-Verlag, New York) CrossrefGoogle Scholar
  • Strauch R. E. Negative dynamic programming. Ann. Math. Statist. (1966) 37:871–890CrossrefGoogle Scholar
  • Yushkevich A. A. On a class of policies in general Markov decision models. Theory Probabil. Appl. (1973) 18:777–779CrossrefGoogle Scholar
  • van der Wal J.Stochastic Dynamic Programming (1981) (The Mathematical Centre, Amsterdam, The Netherlands) Google Scholar
  • Varadarajan R. Reliability and performance models for reconfigurable computer systems. (1987) . Ph.D. thesis, University of Pennsylvania, Philadelphia, PAGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.