Relaxed Indexability and Index Policy for Partially Observable Restless Bandits
References
- (1987) Dynamic Programming: Deterministic and Stochastic Models (Prentice Hall, Hoboken, NJ).Google Scholar
- (1996) Conservation laws, extended polymatroids and multi-armed bandit problems. Math. Oper. Res. 21(2):257–306.Link, Google Scholar
- (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.Link, Google Scholar
- (2020) Index policies and performance bounds for dynamic selection problems. Management Sci. 66(7):3029–3050.Link, Google Scholar
- (2018) Femtocell scheduling as a restless multiarmed bandit problem using partial channel state observation. Proc. IEEE Internat. Conf. Comm. (ICC) (IEEE, Piscataway, NJ), 1–6.Google Scholar
- (2016) Four proofs of Gittins’ multiarmed bandit theorem. Ann. Oper. Res. 241:127–165.Crossref, Google Scholar
- (2021) (Close to) Optimal policies for finite horizon restless bandits. Working paper. https://hal.inria.fr/hal-03262307/file/LP_paper.pdf.Google Scholar
- (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. 41(2):148–177.Crossref, Google Scholar
- (2011) Multi-Armed Bandit Allocation Indices (Wiley, Chichester, UK).Crossref, Google Scholar
- (2009) Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Oper. Res. 57(4):975–989.Link, Google Scholar
- (2011) Dynamic resource allocation in a multi-product make-to-stock production system. Queueing Systems 67:333–364.Crossref, Google Scholar
- (2017) An asymptotically optimal index policy for finite-horizon restless bandits. Preprint, submitted July 1, https://arxiv.org/abs/1707.00205.Google Scholar
- (2011) Multi-channel opportunistic access based on primary ARQ messages overhearing. Proc. IEEE Internat. Conf. Comm. (ICC) (IEEE, Piscataway, NJ), 1–5.Google Scholar
- (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Proc. Amer. Control Conf. (IEEE, Piscataway, NJ), 4220–4225.Google Scholar
- (2020) Whittle index for restless bandits with expanding state spaces. Numer. Math. 42(4):372–384.Google Scholar
- (2024) Low-complexity algorithm for restless bandits with imperfect observations. Math. Methods Oper. Res. 100(2):467–508.Crossref, Google Scholar
- (2011) Indexability and Whittle index for restless bandit problems involving reset processes. Proc. 50th IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 7690–7696.Google Scholar
- (2008) A restless bandit formulation of opportunistic access: Indexability and index Policy. Proc. IEEE Workshop Networking Technol. Software Defined Radio (SDR) Networks (IEEE, Piscataway, NJ), 1–5.Google Scholar
- (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
- (2012) Dynamic intrusion detection in resource-constrained cyber networks. IEEE Internat. Symposium Inform. Theory Proc. (IEEE, Piscataway, NJ), 970–974.Google Scholar
- (2010) Dynamic multichannel access with imperfect channel state detection. IEEE Trans. Signal Process. 58(5):2795–2808.Crossref, Google Scholar
- (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33:76–98.Crossref, Google Scholar
- (2007) Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15:161–198.Crossref, Google Scholar
- (1999) The complexity of optimal queueing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
- (1978) The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Oper. Res. 26(2):282–304.Link, Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):275–294.Crossref, Google Scholar
- (2016) Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4):1947–1995.Crossref, Google Scholar
- (2014) On optimality of myopic policy for opportunistic access with nonidentical channels and imperfect sensing. IEEE Trans. Veh. Technol. 63(5):2478–2483.Crossref, Google Scholar
- (1992) On the Gittins index for multiarmed bandits. Ann. Probab. 2:1024–1033.Google Scholar
- (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
- (1991) Addendum to ‘On an index policy for restless bandits’. Adv. Appl. Probab. 23:429–430.Crossref, Google Scholar
- (1980) Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. Ser. B 42(2):143–149.Crossref, Google Scholar
- (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(a):287–298.Crossref, Google Scholar
- (2019) An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Adv. Appl. Probab. 51(3):745–772.Crossref, Google Scholar
- (2019) Multi-Armed Bandits: Theory and Applications to Online Learning in Networks (Morgan & Claypool, San Rafael, CA).Google Scholar

