Linear Program-Based Policies for Restless Bandits: Necessary and Sufficient Conditions for (Exponentially Fast) Asymptotic Optimality

Published Online:https://doi.org/10.1287/moor.2022.0101

References

  • [1] Altman E (1999) Constrained Markov Decision Processes (Chapman and Hall, London).Google Scholar
  • [2] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, New York).CrossrefGoogle Scholar
  • [3] Brown DB, Smith JE (2020) Index policies and performance bounds for dynamic selection problems. Management Sci. 66:3029–3050.LinkGoogle Scholar
  • [4] Brown DB, Zhang J (2022) Fluid policies, reoptimization, and performance guarantees in dynamic resource allocation. Preprint, submitted November 5, http://dx.doi.org/10.2139/ssrn.4267615.Google Scholar
  • [5] Gallego G, Van Ryzin G (1997) A multiproduct dynamic pricing problem and its applications to network yield management. Oper. Res. 45(1):24–41.LinkGoogle Scholar
  • [6] Gast N, Gaujal B, Khun K (2023) Testing indexability and computing Whittle and Gittins index in subcubic time. Math. Methods Oper. Res. 97:391–436.CrossrefGoogle Scholar
  • [7] Gast N, Gaujal B, Yan C (2022) The LP-update policy for weakly coupled Markov decision processes. Preprint, submitted November 3, https://arxiv.org/abs/2211.01961.Google Scholar
  • [8] Gast N, Gaujal B, Yan C (2023) Exponential asymptotic optimality of Whittle index policy. Queueing Systems 104:107–150.CrossrefGoogle Scholar
  • [9] Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B. Methodological 41(2):148–177.CrossrefGoogle Scholar
  • [10] Hong Y, Xie Q, Chen Y, Wang W (2023) Restless bandits with average reward: Breaking the uniform global attractor assumption. Preprint, submitted May 31, https://arxiv.org/abs/2306.00196.Google Scholar
  • [11] Hu W, Frazier P (2017) An asymptotically optimal index policy for finite-horizon restless bandits. Preprint, submitted July 1, https://arxiv.org/abs/1707.00205.Google Scholar
  • [12] Ioannidis S, Yeh E (2016) Adaptive caching networks with optimality guarantees. ACM SIGMETRICS Performance Evaluation Rev. 44(1):113–124.Google Scholar
  • [13] Jasin S, Kumar S (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.LinkGoogle Scholar
  • [14] Matousek J, Gärtner B (2006) Understanding and Using Linear Programming (Springer Science & Business Media, New York).Google Scholar
  • [15] Mehrotra S, Ye Y (1993) Finding an interior point in the optimal face of linear programs. Math. Programming 62(1):497–515.CrossrefGoogle Scholar
  • [16] Niño-Mora J (2007) Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15:161–198.CrossrefGoogle Scholar
  • [17] Niño-Mora J (2020) A fast-pivoting algorithm for Whittle’s restless bandit index. Mathematics 8(12):2226.CrossrefGoogle Scholar
  • [18] Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.LinkGoogle Scholar
  • [19] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st ed. (John Wiley & Sons, New York).CrossrefGoogle Scholar
  • [20] Verloop M (2016) Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4):1947–1995.CrossrefGoogle Scholar
  • [21] Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.CrossrefGoogle Scholar
  • [22] Whittle P (1980) Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. Ser. B. Methodological 42(2):143–149.CrossrefGoogle Scholar
  • [23] Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25A:287–298.CrossrefGoogle Scholar
  • [24] Wu H, Srikant R, Liu X, Jiang C (2015) Algorithms with logarithmic or sublinear regret for constrained contextual bandits. Adv. Neural Inform. Processing Systems 28.Google Scholar
  • [25] Zayas-Cabán G, Jasin S, Wang G (2019) An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Adv. Appl. Probab. 51(3):745–772.CrossrefGoogle Scholar
  • [26] Zhang X, Frazier PI (2021) Restless bandits with many arms: Beating the central limit theorem. Preprint, submitted July 25, https://arxiv.org/abs/2107.11911.Google Scholar
  • [27] Zhang X, Frazier PI (2022) Near-optimality for infinite-horizon restless bandits with many arms. Preprint, submitted March 29, https://arxiv.org/abs/2203.15853.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.