On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP

Published Online:https://doi.org/10.1287/moor.1070.0279

References

  • Aberdeen D., Baxter J. Internal-state policy-gradient algorithms for infinite-horizon POMDPs. (2001) . Technical report, RSISE, Australian National University, Canberra, AustraliaGoogle Scholar
  • Baxter J., Bartlett P. L. Infinite-horizon policy-gradient estimation. J. Artificial Intelligence Res. (2001) 15:319–350CrossrefGoogle Scholar
  • Bertsekas D. P., Shreve S.Stochastic Optimal Control: The Discrete Time Case (1978) (Academic Press, New York) Google Scholar
  • Bierth K. J. An expected average reward criterion. Stochastic Process. Appl. (1987) 26:123–140CrossrefGoogle Scholar
  • Dynkin E. B., Yushkevich A. A.Controlled Markov Processes (1979) (Springer-Verlag, New York) CrossrefGoogle Scholar
  • Feinberg E. A. An ϵ-optimal control of a finite Markov chain with an average reward criterion. Theory Probab. Appl. (1980) 25:70–81CrossrefGoogle Scholar
  • Feinberg E. A. Controlled Markov processes with arbitrary numerical criteria. Theory Probab. Appl. (1982) 27:486–503CrossrefGoogle Scholar
  • Feinberg E. A. Nonrandomized Markov and semi-Markov strategies in dynamic programming. Theory Probab. Appl. (1982) 27:116–126CrossrefGoogle Scholar
  • Fernández-Gaucherand E., Arapostathis A., Marcus S. I. On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes. Ann. Oper. Res. (1991) 29:439–470CrossrefGoogle Scholar
  • Hsu S.-P., Chuang D.-M., Arapostathis A. On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion. Systems Control Lett. (2006) 55:165–173CrossrefGoogle Scholar
  • Jaakkola T. S., Singh S. P., Jordan M. I. Reinforcement learning algorithm for partially observable Markov decision problems. Proc. Neural Inform. Processing Systems Conf. (1995) Denver, CO(MIT Press, Cambridge, MA) Google Scholar
  • Lauritzen S. L.Graphical Models (1996) (Oxford University Press, Oxford, UK) CrossrefGoogle Scholar
  • Meuleau N., Peshkin L., Kim K.-E., Kaelbling L. P. Learning finite-state controllers for partially observable environment. Proc. 15th Conf. Uncertainty in Artificial Intelligence (1999) Stockholm, Sweden(Morgan Kaufmann, San Francisco) Google Scholar
  • Platzman L. K. Optimal infinite-horizon undiscounted control of finite probabilistic systems. SIAM J. Control Optim. (1980) 18(4):362–380CrossrefGoogle Scholar
  • Puterman M. L.Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994) (John Wiley and Sons, Inc., New York) CrossrefGoogle Scholar
  • Ross S. M. Arbitrary state Markovian decision processes. Ann. Math. Statist. (1968) 39(6):2118–2122CrossrefGoogle Scholar
  • Runggaldier W. J., Stettner L.Approximations of Discrete Time Partially Observable Control Problems, Applied Mathematics Monographs (1994) 6(Giardini Editori e Stampatori, Pisa, Italy) Google Scholar
  • Yu H. A function approximation approach to estimation of policy gradient for POMDP with structured polices. Proc. 21st Conf. Uncertainty in Artificial Intelligence (2005) Edinburgh, UK(AUAI Press)Google Scholar
  • Yu H. Approximate solution methods for partially observable Markov and semi-Markov decision processes. (2006) . Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.