Uniqueness and Stability of Optimal Policies of Finite State Markov Decision Processes

Published Online:https://doi.org/10.1287/moor.1060.0232

References

  • Aubin J. P., Ekeland I.Applied Nonlinear Analysis (1984) (Wiley Interscience, New York) Google Scholar
  • Bather J. Optimal decision procedures for finite Markov chains, I. Adv. Appl. Probab. (1973) 5:328–339CrossrefGoogle Scholar
  • Bather J. Optimal decision procedures for finite Markov chains, II. Adv. Appl. Probab. (1973) 5:521–540CrossrefGoogle Scholar
  • Bather J. Optimal decision procedures for finite Markov chains, III. Adv. Appl. Probab. (1973) 5:541–553CrossrefGoogle Scholar
  • Borkar V. S. On minimum cost per unit time control of Markov chains. SIAM J. Control Optim. (1984) 22:965–984CrossrefGoogle Scholar
  • Borkar V. S. Control of Markov chains with long-run average cost criterion: The dynamic programming equations. SIAM J. Control Optim. (1989) 27:642–657CrossrefGoogle Scholar
  • Federgruen A., Schweitzer J. P. A fixed-point approach to undiscounted Markov renewal programs. SIAM J. Algebraic Discrete Methods (1984) 5:539–550CrossrefGoogle Scholar
  • Federgruen A., Schweitzer J. P., Tijms H. C. Denumerable undiscounted semi-Markov decision processes with unbounded rewards. Math. Oper. Res. (1983) 8:298–313LinkGoogle Scholar
  • Feinberg E. A. On controlled finite state Markov processes with compact control sets. Theory Probab. Appl. (1975) 20:856–861CrossrefGoogle Scholar
  • Hinderer K.Foundation of Non-Stationary Dynamic Programming with Discrete-Time Parameter. Lecture Notes in Operations Research (1970) 33(Springer-Verlag, New York) CrossrefGoogle Scholar
  • Hordijk A. Dynamic programming and Markov potential theory. Math. Centre. Tracts (1974) 51(Amsterdam, The Netherlands)Google Scholar
  • Hordijk A., Puterman M. L. On the convergence of policy iteration in undiscounted finite state Markov processes: The unichain case. Math. Oper. Res. (1987) 12:163–176LinkGoogle Scholar
  • Kelley J. L.General Topology (1975) 27(Springer-Verlag, New York) Google Scholar
  • Leizarowitz A. Overtaking and almost-sure optimality for infinite horizon Markov decision processes. Math. Oper. Res. (1996) 21:158–181LinkGoogle Scholar
  • Leizarowitz A. An algorithm to identify and compute average optimal policies in multichain Markov decision processes. Math. Oper. Res. (2003) 28:553–586LinkGoogle Scholar
  • Schweitzer P. J. On the solvability of Bellman’s functional equation for Markov renewal programs. J. Math. Anal. Appl. (1983) 96:13–23CrossrefGoogle Scholar
  • Schweitzer P. J. A Brouwer fixed-point mapping approach to communicating Markov decision processes. J. Math. Anal. Appl. (1987) 123:117–130CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.