Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results

Vikram Krishnamurthy
Vikram Krishnamurthy
[email protected]
Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
Search for more papers by this author
,
Bo Wahlberg
Bo Wahlberg
[email protected]
Automatic Control and ACCESS, School of Electrical Engineering, KTH, SE-100 44 Stockholm, Sweden
Search for more papers by this author

Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada

Search for more papers by this author

Bo Wahlberg

[email protected]

Automatic Control and ACCESS, School of Electrical Engineering, KTH, SE-100 44 Stockholm, Sweden

Search for more papers by this author

Published Online:10 Apr 2009https://doi.org/10.1287/moor.1080.0371

Abstract

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy's approximation is presented. We then show that for the case of totally positive of order 2 (TP2) transition probability matrices and monotone likelihood ratio (MLR) ordered observation probabilities, the Gittins index is MLR increasing in the information state. Algorithms that exploit this structure are then presented.

cover image Mathematics of Operations Research

Volume 34, Issue 2

May 2009

Pages 257-512

Article Information

Metrics

Information

Received:July 24, 2008
Published Online:April 10, 2009

Cite as

Vikram Krishnamurthy, Bo Wahlberg, (2009) Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results. Mathematics of Operations Research 34(2):287-302.

https://doi.org/10.1287/moor.1080.0371

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results

Abstract

Volume 34, Issue 2

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News