On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP

Huizhen Yu
Huizhen Yu
[email protected]
Department of Computer Science, P.O. Box 68, FIN-00014 University of Helsinki, Finland
Search for more papers by this author
,
Dimitri P. Bertsekas
Dimitri P. Bertsekas
[email protected]
Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
Search for more papers by this author

Department of Computer Science, P.O. Box 68, FIN-00014 University of Helsinki, Finland

Laboratory for Information and Decision Systems, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA

Search for more papers by this author

Published Online:1 Feb 2008https://doi.org/10.1287/moor.1070.0279

References

Aberdeen D., Baxter J. Internal-state policy-gradient algorithms for infinite-horizon POMDPs. (2001) . Technical report, RSISE, Australian National University, Canberra, AustraliaGoogle Scholar
Baxter J., Bartlett P. L. Infinite-horizon policy-gradient estimation. J. Artificial Intelligence Res. (2001) 15:319–350Crossref, Google Scholar
Bertsekas D. P., Shreve S.Stochastic Optimal Control: The Discrete Time Case (1978) (Academic Press, New York) Google Scholar
Bierth K. J. An expected average reward criterion. Stochastic Process. Appl. (1987) 26:123–140Crossref, Google Scholar
Dynkin E. B., Yushkevich A. A.Controlled Markov Processes (1979) (Springer-Verlag, New York) Crossref, Google Scholar
Feinberg E. A. An ϵ-optimal control of a finite Markov chain with an average reward criterion. Theory Probab. Appl. (1980) 25:70–81Crossref, Google Scholar
Feinberg E. A. Controlled Markov processes with arbitrary numerical criteria. Theory Probab. Appl. (1982) 27:486–503Crossref, Google Scholar
Feinberg E. A. Nonrandomized Markov and semi-Markov strategies in dynamic programming. Theory Probab. Appl. (1982) 27:116–126Crossref, Google Scholar
Fernández-Gaucherand E., Arapostathis A., Marcus S. I. On the average cost optimality equation and the structure of optimal policies for partially observable Markov decision processes. Ann. Oper. Res. (1991) 29:439–470Crossref, Google Scholar
Hsu S.-P., Chuang D.-M., Arapostathis A. On the existence of stationary optimal policies for partially observed MDPs under the long-run average cost criterion. Systems Control Lett. (2006) 55:165–173Crossref, Google Scholar
Jaakkola T. S., Singh S. P., Jordan M. I. Reinforcement learning algorithm for partially observable Markov decision problems. Proc. Neural Inform. Processing Systems Conf. (1995) Denver, CO(MIT Press, Cambridge, MA) Google Scholar
Lauritzen S. L.Graphical Models (1996) (Oxford University Press, Oxford, UK) Crossref, Google Scholar
Meuleau N., Peshkin L., Kim K.-E., Kaelbling L. P. Learning finite-state controllers for partially observable environment. Proc. 15th Conf. Uncertainty in Artificial Intelligence (1999) Stockholm, Sweden(Morgan Kaufmann, San Francisco) Google Scholar
Platzman L. K. Optimal infinite-horizon undiscounted control of finite probabilistic systems. SIAM J. Control Optim. (1980) 18(4):362–380Crossref, Google Scholar
Puterman M. L.Markov Decision Processes: Discrete Stochastic Dynamic Programming (1994) (John Wiley and Sons, Inc., New York) Crossref, Google Scholar
Ross S. M. Arbitrary state Markovian decision processes. Ann. Math. Statist. (1968) 39(6):2118–2122Crossref, Google Scholar
Runggaldier W. J., Stettner L.Approximations of Discrete Time Partially Observable Control Problems, Applied Mathematics Monographs (1994) 6(Giardini Editori e Stampatori, Pisa, Italy) Google Scholar
Yu H. A function approximation approach to estimation of policy gradient for POMDP with structured polices. Proc. 21st Conf. Uncertainty in Artificial Intelligence (2005) Edinburgh, UK(AUAI Press)Google Scholar
Yu H. Approximate solution methods for partially observable Markov and semi-Markov decision processes. (2006) . Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MAGoogle Scholar

cover image Mathematics of Operations Research

Volume 33, Issue 1

February 2008

Pages 1-256

Article Information

Metrics

Information

Received:May 22, 2006
Published Online:February 01, 2008

Cite as

Huizhen Yu, Dimitri P. Bertsekas, (2008) On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP. Mathematics of Operations Research 33(1):1-11.

https://doi.org/10.1287/moor.1070.0279

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On Near Optimality of the Set of Finite-State Controllers for Average Cost POMDP

References

Volume 33, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News