Unichain and Aperiodicity Are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

Published Online:https://doi.org/10.1287/moor.2024.0678

We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an O(1/N) optimality gap for an N-armed problem, assuming only a unichain and aperiodicity assumption. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Global Attractor Property to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption.

Funding: Y. Hong and W. Wang are supported in part by the U.S. National Science Foundation (NSF) [Grants ECCS-2145713, CCF-2403194, CCF-2428569, and ECCS-2432545]. Y. Chen is supported in part by the NSF [Grant CCF-2233152]. Q. Xie is supported in part by the NSF [Grants CNS-1955997, ECCS-2339794, and ECCS-2432546].

Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2024.0678.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.