Relaxed Indexability and Index Policy for Partially Observable Restless Bandits

Keqin Liu
Keqin Liu
[email protected]
https://orcid.org/0000-0002-4783-346X
School of Mathematics and Physics, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; and Jiangsu National Center for Applied Mathematics, Nanjing 210093, China
Search for more papers by this author

School of Mathematics and Physics, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; and Jiangsu National Center for Applied Mathematics, Nanjing 210093, China

Search for more papers by this author

Published Online:16 Apr 2025https://doi.org/10.1287/mnsc.2022.02831

References

Bertsekas DP (1987) Dynamic Programming: Deterministic and Stochastic Models (Prentice Hall, Hoboken, NJ).Google Scholar
Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multi-armed bandit problems. Math. Oper. Res. 21(2):257–306.Link, Google Scholar
Bertsimas D, Niño-Mora J (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.Link, Google Scholar
Brown DB, Smith JE (2020) Index policies and performance bounds for dynamic selection problems. Management Sci. 66(7):3029–3050.Link, Google Scholar
Elmaghraby HM, Liu K, Ding Z (2018) Femtocell scheduling as a restless multiarmed bandit problem using partial channel state observation. Proc. IEEE Internat. Conf. Comm. (ICC) (IEEE, Piscataway, NJ), 1–6.Google Scholar
Frostig E, Weiss G (2016) Four proofs of Gittins’ multiarmed bandit theorem. Ann. Oper. Res. 241:127–165.Crossref, Google Scholar
Gast N, Gaujal B, Yan C (2021) (Close to) Optimal policies for finite horizon restless bandits. Working paper. https://hal.inria.fr/hal-03262307/file/LP_paper.pdf.Google Scholar
Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. 41(2):148–177.Crossref, Google Scholar
Gittins JC, Glazebrook KD, Weber RR (2011) Multi-Armed Bandit Allocation Indices (Wiley, Chichester, UK).Crossref, Google Scholar
Glazebrook KD, Kirkbride C, Ouenniche J (2009) Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Oper. Res. 57(4):975–989.Link, Google Scholar
Hodge DJ, Glazebrook KD (2011) Dynamic resource allocation in a multi-product make-to-stock production system. Queueing Systems 67:333–364.Crossref, Google Scholar
Hu W, Frazier PI (2017) An asymptotically optimal index policy for finite-horizon restless bandits. Preprint, submitted July 1, https://arxiv.org/abs/1707.00205.Google Scholar
Lapiccirella FE, Liu K, Ding Z (2011) Multi-channel opportunistic access based on primary ARQ messages overhearing. Proc. IEEE Internat. Conf. Comm. (ICC) (IEEE, Piscataway, NJ), 1–5.Google Scholar
Le Ny J, Dahleh M, Feron E (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Proc. Amer. Control Conf. (IEEE, Piscataway, NJ), 4220–4225.Google Scholar
Liu K (2020) Whittle index for restless bandits with expanding state spaces. Numer. Math. 42(4):372–384.Google Scholar
Liu K, Weber RR, Zhang C (2024) Low-complexity algorithm for restless bandits with imperfect observations. Math. Methods Oper. Res. 100(2):467–508.Crossref, Google Scholar
Liu K, Weber RR, Zhao Q (2011) Indexability and Whittle index for restless bandit problems involving reset processes. Proc. 50th IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 7690–7696.Google Scholar
Liu K, Zhao Q (2008) A restless bandit formulation of opportunistic access: Indexability and index Policy. Proc. IEEE Workshop Networking Technol. Software Defined Radio (SDR) Networks (IEEE, Piscataway, NJ), 1–5.Google Scholar
Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
Liu K, Zhao Q (2012) Dynamic intrusion detection in resource-constrained cyber networks. IEEE Internat. Symposium Inform. Theory Proc. (IEEE, Piscataway, NJ), 970–974.Google Scholar
Liu K, Zhao Q, Krishnamachari B (2010) Dynamic multichannel access with imperfect channel state detection. IEEE Trans. Signal Process. 58(5):2795–2808.Crossref, Google Scholar
Niño-Mora J (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33:76–98.Crossref, Google Scholar
Niño-Mora J (2007) Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15:161–198.Crossref, Google Scholar
Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queueing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
Sondik EJ (1978) The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res. 26(2):282–304.Link, Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):275–294.Crossref, Google Scholar
Verloop IM (2016) Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4):1947–1995.Crossref, Google Scholar
Wang K, Chen L, Liu Q (2014) On optimality of myopic policy for opportunistic access with nonidentical channels and imperfect sensing. IEEE Trans. Veh. Technol. 63(5):2478–2483.Crossref, Google Scholar
Weber RR (1992) On the Gittins index for multiarmed bandits. Ann. Probab. 2:1024–1033.Google Scholar
Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
Weber RR, Weiss G (1991) Addendum to ‘On an index policy for restless bandits’. Adv. Appl. Probab. 23:429–430.Crossref, Google Scholar
Whittle P (1980) Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. Ser. B 42(2):143–149.Crossref, Google Scholar
Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(a):287–298.Crossref, Google Scholar
Zayas-Cabán G, Jasin S, Wang G (2019) An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Adv. Appl. Probab. 51(3):745–772.Crossref, Google Scholar
Zhao Q (2019) Multi-Armed Bandits: Theory and Applications to Online Learning in Networks (Morgan & Claypool, San Rafael, CA).Google Scholar

Volume 71, Issue 12

December 2025

Pages vii-x, 9869-10753, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:September 12, 2022
Accepted:December 20, 2024
Published Online:April 16, 2025

Cite as

Keqin Liu (2025) Relaxed Indexability and Index Policy for Partially Observable Restless Bandits. Management Science 71(12):10106-10121.

https://doi.org/10.1287/mnsc.2022.02831

Keywords

Acknowledgments

The author gratefully acknowledges the help from his students, Jiale Zha and Chengzhong Zhang, for the numerical analysis and figures. The author’s colleague Prof. Ting Wu and the anonymous reviewers provided very helpful comments for improving this paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Relaxed Indexability and Index Policy for Partially Observable Restless Bandits

References

Volume 71, Issue 12

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News