Unichain and Aperiodicity Are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

Yige Hong
Corresponding Author
Yige Hong
[email protected]
https://orcid.org/0000-0001-8534-1063
Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author
,
Qiaomin Xie
Qiaomin Xie
[email protected]
https://orcid.org/0000-0003-2834-6866
Department of Industrial and Systems Engineering, University of Wisconsin–Madison, Madison, Wisconsin 53706
Search for more papers by this author
,
Yudong Chen
Yudong Chen
[email protected]
https://orcid.org/0000-0002-6416-5635
Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin 53706
Search for more papers by this author
,
Weina Wang
Weina Wang
[email protected]
https://orcid.org/0000-0001-6808-0156
Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author

Yige Hong

Corresponding Author

Yige Hong

[email protected]

https://orcid.org/0000-0001-8534-1063

Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Qiaomin Xie

[email protected]

https://orcid.org/0000-0003-2834-6866

Department of Industrial and Systems Engineering, University of Wisconsin–Madison, Madison, Wisconsin 53706

Search for more papers by this author

Yudong Chen

[email protected]

https://orcid.org/0000-0002-6416-5635

Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin 53706

Search for more papers by this author

Weina Wang

[email protected]

https://orcid.org/0000-0001-6808-0156

Computer Science Department, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Published Online:11 Dec 2025https://doi.org/10.1287/moor.2024.0678

References

[1] Bertsimas D, Niño Mora J (2000) Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1):80–90.Link, Google Scholar
[2] Brown DB, Smith JE (2020) Index policies and performance bounds for dynamic selection problems. Management Sci. 66(7):3029–3050.Link, Google Scholar
[3] Brown DB, Zhang J (2022) Dynamic programs with shared resources and signals: Dynamic fluid policies and asymptotic optimality. Oper. Res. 70(5):3015–3033.Link, Google Scholar
[4] Brown DB, Zhang J (2025) Fluid policies, reoptimization, and performance guarantees in dynamic resource allocation. Oper. Res. 73(2):1029–1045.Link, Google Scholar
[5] D’Aeth JC, Ghosal S, Grimm F, Haw D, Koca E, Lau K, Liu H, et al. (2023) Optimal hospital care scheduling during the SARS-CoV-2 pandemic. Management Sci. 69(10):5923–5947.Link, Google Scholar
[6] Durrett R (2019) Probability: Theory and Examples, 5th ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[7] Gast N, Gaujal B, Khun K (2023) Testing indexability and computing Whittle and Gittins index in subcubic time. Math. Methods Oper. Res. 97(3):391–436.Crossref, Google Scholar
[8] Gast N, Gaujal B, Yan C (2020) Exponential convergence rate for the asymptotic optimality of Whittle index policy. Preprint, submitted December 16, https://arxiv.org/abs/2012.09064.Google Scholar
[9] Gast N, Gaujal B, Yan C (2023) Exponential asymptotic optimality of Whittle index policy. Queueing Systems 104(1):107–150.Crossref, Google Scholar
[10] Gast N, Gaujal B, Yan C (2024) Linear program-based policies for restless bandits: Necessary and sufficient conditions for (exponentially fast) asymptotic optimality. Math. Oper. Res. 49(4):2468–2491.Link, Google Scholar
[11] Gast N, Gaujal B, Yan C (2024) Reoptimization nearly solves weakly coupled Markov decision processes. Preprint, submitted May 6, https://arxiv.org/abs/2211.01961.Google Scholar
[12] Ghosh A, Nagaraj D, Jain M, Tambe M (2023) Indexability is not enough for Whittle: Improved, near-optimal algorithms for restless bandits. Proc. Internat. Conf. Autonomous Agents Multiagent Systems (International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC), 1294–1302.Google Scholar
[13] Goldsztajn D, Avrachenkov K (2024) Asymptotically optimal policies for weakly coupled Markov decision processes. Preprint, submitted December 6, https://arxiv.org/abs/2406.04751.Google Scholar
[14] Hawkins JT (2003) A Langrangian decomposition approach to weakly coupled dynamic optimization problems and its applications. PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
[15] Hong Y, Xie Q, Chen Y, Wang W (2023) Restless bandits with average reward: Breaking the uniform global attractor assumption. Adv. Neural Inform. Processing Systems 36:12810–12844.Google Scholar
[16] Hong Y, Xie Q, Chen Y, Wang W (2024) Achieving exponential asymptotic optimality in average-reward restless bandits without global attractor assumption. Preprint, submitted October 17, https://arxiv.org/abs/2405.17882.Google Scholar
[17] Hong Y, Xie Q, Chen Y, Wang W (2024) Unichain and aperiodicity are sufficient for asymptotic optimality of average-reward restless bandits. https://github.com/YigeHong/rb-focus-set/.Google Scholar
[18] Hu W, Frazier P (2017) An asymptotically optimal index policy for finite-horizon restless bandits. Preprint, submitted July 1, https://arxiv.org/abs/1707.00205.Google Scholar
[19] Niño-Mora J (2023) Markovian restless bandits and index policies: A review. Mathematics 11(7):1639.Crossref, Google Scholar
[20] Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
[21] Puterman ML (2005) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NJ).Google Scholar
[22] Verloop IM (2016) Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4):1947–1995.Crossref, Google Scholar
[23] Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
[24] Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25:287–298.Crossref, Google Scholar
[25] Yan C (2024) An optimal-control approach to infinite-horizon restless bandits: Achieving asymptotic optimality with minimal assumptions. Proc. IEEE Conf. Decision Control CDC (IEEE, Piscataway, NJ), 6665–6672.Google Scholar
[26] Zayas-Cabán G, Jasin S, Wang G (2019) An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits. Adv. Appl. Probab. 51:745–772.Crossref, Google Scholar
[27] Zhang X, Frazier PI (2021) Restless bandits with many arms: Beating the central limit theorem. Preprint, submitted July 25, https://arxiv.org/abs/2107.11911.Google Scholar
[28] Zhang X, Frazier PI (2022) Near-optimality for infinite-horizon restless bandits with many arms. Preprint, submitted March 29, https://arxiv.org/abs/2203.15853.Google Scholar
[29] Zhang X, Hong Y, Wang W (2025) Projection-based Lyapunov method for fully heterogeneous weakly-coupled MDPs. Preprint, submitted February 9, https://arxiv.org/abs/2502.06072.Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:September 09, 2024
Accepted:August 31, 2025
Published Online:December 11, 2025

Cite as

Yige Hong, Qiaomin Xie, Yudong Chen, Weina Wang (2025) Unichain and Aperiodicity Are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0678

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Unichain and Aperiodicity Are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News