Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results

Published Online:https://doi.org/10.1287/moor.1080.0371

References

  • Bertsimas D., Nino-Mora J. Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. (1996) 21(2):257–305LinkGoogle Scholar
  • Cassandra A. R. Tony's POMDP page. () . http://www.cs.brown.edu/research/ai/pomdp/index.htmlGoogle Scholar
  • Cassandra A. R. Exact and approximate algorithms for partially observed Markov decision process. (1998) . Doctoral dissertation, Brown University, Providence, RIGoogle Scholar
  • Cassandra A. R., Littman M. L., Zhang N. L. Incremental pruning: A simple fast exact method for partially observed Markov decision processes. Proc. 13th Annual Conf. Uncertainty in Artificial Intelligence (UAI-97) (1997) (Morgan Kaufmann, San Francisco) Google Scholar
  • Gittins J. C.Multi-Armed Bandit Allocation Indices (1989) (John Wiley and Sons, New York) Google Scholar
  • Kijima M.Markov Processes for Stochastic Modelling (1997) (Chapman and Hall, London) Google Scholar
  • Krishnamurthy V., Djonin D. Structured threshold policies for dynamic sensor scheduling—A partially observed Markov decision process approach. IEEE Trans. Signal Processing (2007) 55(10):4938–4957CrossrefGoogle Scholar
  • Krishnamurthy V., Vázquez-Abad F. J., Martin K. Implementation of gradient estimation to a constrained Markov decision problem. 42nd IEEE Conf. Decision and Control (2003) Maui, Hawaii(IEEE Press, Piscataway, NJ) Google Scholar
  • Kumar P. R., Varaiya P.Stochastic Systems—Estimation, Identification and Adaptive Control (1986) (Prentice-Hall, Upper Saddle River, NJ) Google Scholar
  • Kushner H. J., Yin G. Stochastic approximation algorithms for parallel and distributed processing. Stochastics (1987) 22:219–250CrossrefGoogle Scholar
  • Le Cadre J. P., Trémois O. Bearings-only tracking for maneuvering sources. IEEE Trans. Aerospace Electronic Systems (1998) 34(1):179–193CrossrefGoogle Scholar
  • Lovejoy W. S. Some monotonicity results for partially observed Markov decision processes. Oper. Res. (1987) 35(5):736–743LinkGoogle Scholar
  • Lovejoy W. S. Computationally feasible bounds for partially observed Markov decision processes. Oper. Res. (1991) 39(1):162–175LinkGoogle Scholar
  • Muller A., Stoyan D.Comparison Methods for Stochastic Models and Risk (2002) (John Wiley & Sons, Chichester, UK) Google Scholar
  • Papadimitriou C. H.Computational Complexity (1995) (Addison-Wesley, Reading, MA) Google Scholar
  • Rabiner L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE (1989) 77(2):257–285CrossrefGoogle Scholar
  • Ross S.Introduction to Stochastic Dynamic Programming (1983) (Academic Press, San Diego) Google Scholar
  • Smallwood R. D., Sondik E. J. Optimal control of partially observable Markov processes over a finite horizon. Oper. Res. (1973) 21:1071–1088LinkGoogle Scholar
  • Spall J.Introduction to Stochastic Search and Optimization (2003) (John Wiley and Sons, New York) CrossrefGoogle Scholar
  • Whittle P. Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B (1980) 42(2):143–149Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.