Unichain and Aperiodicity Are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

Yige Hong
Corresponding Author
Yige Hong
[email protected]
https://orcid.org/0000-0001-8534-1063
Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author
,
Qiaomin Xie
Qiaomin Xie
[email protected]
https://orcid.org/0000-0003-2834-6866
Department of Industrial and Systems Engineering, University of Wisconsin–Madison, Madison, Wisconsin 53706
Search for more papers by this author
,
Yudong Chen
Yudong Chen
[email protected]
https://orcid.org/0000-0002-6416-5635
Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin 53706
Search for more papers by this author
,
Weina Wang
Weina Wang
[email protected]
https://orcid.org/0000-0001-6808-0156
Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author

Yige Hong

Corresponding Author

Yige Hong

[email protected]

https://orcid.org/0000-0001-8534-1063

Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Qiaomin Xie

[email protected]

https://orcid.org/0000-0003-2834-6866

Department of Industrial and Systems Engineering, University of Wisconsin–Madison, Madison, Wisconsin 53706

Search for more papers by this author

Yudong Chen

[email protected]

https://orcid.org/0000-0002-6416-5635

Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin 53706

Search for more papers by this author

Weina Wang

[email protected]

https://orcid.org/0000-0001-6808-0156

Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Published Online:11 Dec 2025https://doi.org/10.1287/moor.2024.0678

Abstract

We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward the optimal distribution. We show that our policies are asymptotically optimal with an $O (1 / \sqrt{N})$ optimality gap for an N-armed problem, assuming only a unichain and aperiodicity assumption. Our approach departs from most existing work that focuses on index or priority policies, which rely on the Global Attractor Property to guarantee convergence to the optimum, or a recently developed simulation-based policy, which requires a Synchronization Assumption.

Funding: Y. Hong and W. Wang are supported in part by the U.S. National Science Foundation (NSF) [Grants ECCS-2145713, CCF-2403194, CCF-2428569, and ECCS-2432545]. Y. Chen is supported in part by the NSF [Grant CCF-2233152]. Q. Xie is supported in part by the NSF [Grants CNS-1955997, ECCS-2339794, and ECCS-2432546].

Supplemental Material: The online appendix is available at https://doi.org/10.1287/moor.2024.0678.

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:September 09, 2024
Accepted:August 31, 2025
Published Online:December 11, 2025

Cite as

Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang (2025) Unichain and Aperiodicity Are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0678

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Unichain and Aperiodicity Are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

Abstract

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News