Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results

Vikram Krishnamurthy
Vikram Krishnamurthy
[email protected]
Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
Search for more papers by this author
,
Bo Wahlberg
Bo Wahlberg
[email protected]
Automatic Control and ACCESS, School of Electrical Engineering, KTH, SE-100 44 Stockholm, Sweden
Search for more papers by this author

Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada

Search for more papers by this author

Bo Wahlberg

[email protected]

Automatic Control and ACCESS, School of Electrical Engineering, KTH, SE-100 44 Stockholm, Sweden

Search for more papers by this author

Published Online:10 Apr 2009https://doi.org/10.1287/moor.1080.0371

References

Bertsimas D., Nino-Mora J. Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. (1996) 21(2):257–305Link, Google Scholar
Cassandra A. R. Tony's POMDP page. () . http://www.cs.brown.edu/research/ai/pomdp/index.htmlGoogle Scholar
Cassandra A. R. Exact and approximate algorithms for partially observed Markov decision process. (1998) . Doctoral dissertation, Brown University, Providence, RIGoogle Scholar
Cassandra A. R., Littman M. L., Zhang N. L. Incremental pruning: A simple fast exact method for partially observed Markov decision processes. Proc. 13th Annual Conf. Uncertainty in Artificial Intelligence (UAI-97) (1997) (Morgan Kaufmann, San Francisco) Google Scholar
Gittins J. C.Multi-Armed Bandit Allocation Indices (1989) (John Wiley and Sons, New York) Google Scholar
Kijima M.Markov Processes for Stochastic Modelling (1997) (Chapman and Hall, London) Google Scholar
Krishnamurthy V., Djonin D. Structured threshold policies for dynamic sensor scheduling—A partially observed Markov decision process approach. IEEE Trans. Signal Processing (2007) 55(10):4938–4957Crossref, Google Scholar
Krishnamurthy V., Vázquez-Abad F. J., Martin K. Implementation of gradient estimation to a constrained Markov decision problem. 42nd IEEE Conf. Decision and Control (2003) Maui, Hawaii(IEEE Press, Piscataway, NJ) Google Scholar
Kumar P. R., Varaiya P.Stochastic Systems—Estimation, Identification and Adaptive Control (1986) (Prentice-Hall, Upper Saddle River, NJ) Google Scholar
Kushner H. J., Yin G. Stochastic approximation algorithms for parallel and distributed processing. Stochastics (1987) 22:219–250Crossref, Google Scholar
Le Cadre J. P., Trémois O. Bearings-only tracking for maneuvering sources. IEEE Trans. Aerospace Electronic Systems (1998) 34(1):179–193Crossref, Google Scholar
Lovejoy W. S. Some monotonicity results for partially observed Markov decision processes. Oper. Res. (1987) 35(5):736–743Link, Google Scholar
Lovejoy W. S. Computationally feasible bounds for partially observed Markov decision processes. Oper. Res. (1991) 39(1):162–175Link, Google Scholar
Muller A., Stoyan D.Comparison Methods for Stochastic Models and Risk (2002) (John Wiley & Sons, Chichester, UK) Google Scholar
Papadimitriou C. H.Computational Complexity (1995) (Addison-Wesley, Reading, MA) Google Scholar
Rabiner L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE (1989) 77(2):257–285Crossref, Google Scholar
Ross S.Introduction to Stochastic Dynamic Programming (1983) (Academic Press, San Diego) Google Scholar
Smallwood R. D., Sondik E. J. Optimal control of partially observable Markov processes over a finite horizon. Oper. Res. (1973) 21:1071–1088Link, Google Scholar
Spall J.Introduction to Stochastic Search and Optimization (2003) (John Wiley and Sons, New York) Crossref, Google Scholar
Whittle P. Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B (1980) 42(2):143–149Google Scholar

cover image Mathematics of Operations Research

Volume 34, Issue 2

May 2009

Pages 257-512

Article Information

Metrics

Information

Received:July 24, 2008
Published Online:April 10, 2009

Cite as

Vikram Krishnamurthy, Bo Wahlberg, (2009) Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results. Mathematics of Operations Research 34(2):287-302.

https://doi.org/10.1287/moor.1080.0371

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results

References

Volume 34, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News