Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality

Nicolas Gast
Nicolas Gast
[email protected]
University of Grenoble Alpes, Institut national de recherche en informatique et en automatique, Centre national de la recherche scientifique, Grenoble Institut national polytechnique, Laboratoire d’informatique de Grenoble, 38000 Grenoble, France;
Search for more papers by this author
,
Bruno Gaujal
Bruno Gaujal
[email protected]
University of Grenoble Alpes, Institut national de recherche en informatique et en automatique, Centre national de la recherche scientifique, Grenoble Institut national polytechnique, Laboratoire d’informatique de Grenoble, 38000 Grenoble, France;
Search for more papers by this author
,
Chen Yan
Corresponding Author
Chen Yan
[email protected]
https://orcid.org/0000-0002-0551-4786
STATIFY, Institut national de recherche en informatique et en automatique, 38334 Saint Ismier, France;Biostatistics and Spatial Processes, Institut national de recherche pour l’agriculture, l’alimentation et l’environnement, 84914 Avignon, France
Search for more papers by this author

University of Grenoble Alpes, Institut national de recherche en informatique et en automatique, Centre national de la recherche scientifique, Grenoble Institut national polytechnique, Laboratoire d’informatique de Grenoble, 38000 Grenoble, France;

Search for more papers by this author

Bruno Gaujal

[email protected]

Search for more papers by this author

Chen Yan

Corresponding Author

Chen Yan

[email protected]

https://orcid.org/0000-0002-0551-4786

STATIFY, Institut national de recherche en informatique et en automatique, 38334 Saint Ismier, France;Biostatistics and Spatial Processes, Institut national de recherche pour l’agriculture, l’alimentation et l’environnement, 84914 Avignon, France

Search for more papers by this author

Published Online:1 Dec 2023https://doi.org/10.1287/moor.2022.0101

References

[1] Altman E (1999) Constrained Markov Decision Processes (Chapman and Hall, London).Google Scholar
[2] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, New York).Crossref, Google Scholar
[3] Brown DB, Smith JE (2020) Index policies and performance bounds for dynamic selection problems. Management Sci. 66:3029–3050.Link, Google Scholar
[4] Brown DB, Zhang J (2022) Fluid policies, reoptimization, and performance guarantees in dynamic resource allocation. Preprint, submitted November 5, http://dx.doi.org/10.2139/ssrn.4267615.Google Scholar
[5] Gallego G, Van Ryzin G (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Oper. Res. 45(1):24–41.Link, Google Scholar
[6] Gast N, Gaujal B, Khun K (2023) Testing indexability and computing Whittle and Gittins index in subcubic time. Math. Methods Oper. Res. 97:391–436.Crossref, Google Scholar
[7] Gast N, Gaujal B, Yan C (2022) The LP-update policy for weakly coupled Markov decision processes. Preprint, submitted November 3, https://arxiv.org/abs/2211.01961.Google Scholar
[8] Gast N, Gaujal B, Yan C (2023) Exponential asymptotic optimality of Whittle index policy. Queueing Systems 104:107–150.Crossref, Google Scholar
[9] Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B. Methodological 41(2):148–177.Crossref, Google Scholar
[10] Hong Y, Xie Q, Chen Y, Wang W (2023) Restless bandits with average reward: Breaking the uniform global attractor assumption. Preprint, submitted May 31, https://arxiv.org/abs/2306.00196.Google Scholar
[11] Hu W, Frazier P (2017) An asymptotically optimal index policy for finite-horizon restless bandits. Preprint, submitted July 1, https://arxiv.org/abs/1707.00205.Google Scholar
[12] Ioannidis S, Yeh E (2016) Adaptive caching networks with optimality guarantees. ACM SIGMETRICS Performance Evaluation Rev. 44(1):113–124.Google Scholar
[13] Jasin S, Kumar S (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.Link, Google Scholar
[14] Matousek J, Gärtner B (2006) Understanding and Using Linear Programming (Springer Science & Business Media, New York).Google Scholar
[15] Mehrotra S, Ye Y (1993) Finding an interior point in the optimal face of linear programs. Math. Programming 62(1):497–515.Crossref, Google Scholar
[16] Niño-Mora J (2007) Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15:161–198.Crossref, Google Scholar
[17] Niño-Mora J (2020) A fast-pivoting algorithm for Whittle’s restless bandit index. Mathematics 8(12):2226.Crossref, Google Scholar
[18] Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
[19] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st ed. (John Wiley & Sons, New York).Crossref, Google Scholar
[20] Verloop M (2016) Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4):1947–1995.Crossref, Google Scholar
[21] Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
[22] Whittle P (1980) Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. Ser. B. Methodological 42(2):143–149.Crossref, Google Scholar
[23] Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25A:287–298.Crossref, Google Scholar
[24] Wu H, Srikant R, Liu X, Jiang C (2015) Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Adv. Neural Inform. Processing Systems 28.Google Scholar
[25] Zayas-Cabán G, Jasin S, Wang G (2019) An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Adv. Appl. Probab. 51(3):745–772.Crossref, Google Scholar
[26] Zhang X, Frazier PI (2021) Restless bandits with many arms: Beating the central limit theorem. Preprint, submitted July 25, https://arxiv.org/abs/2107.11911.Google Scholar
[27] Zhang X, Frazier PI (2022) Near-optimality for infinite-horizon restless bandits with many arms. Preprint, submitted March 29, https://arxiv.org/abs/2203.15853.Google Scholar

cover image Mathematics of Operations Research

Volume 49, Issue 4

November 2024

Pages 2049-2802, C2

Article Information

Metrics

Information

Received:April 12, 2022
Accepted:October 30, 2023
Published Online:December 01, 2023

Cite as

Nicolas Gast, Bruno Gaujal, Chen Yan (2023) Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality. Mathematics of Operations Research 49(4):2468-2491.

https://doi.org/10.1287/moor.2022.0101

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality

References

Volume 49, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News