A Verification Theorem for Threshold-Indexability of Real-State Discounted Restless Bandits

José Niño-Mora
Corresponding Author
José Niño-Mora
http://orcid.org/0000-0002-2172-3983
Department of Statistics, Carlos III University of Madrid, 28903 Getafe (Madrid), Spain
Search for more papers by this author

José Niño-Mora

Corresponding Author

José Niño-Mora

http://orcid.org/0000-0002-2172-3983

Department of Statistics, Carlos III University of Madrid, 28903 Getafe (Madrid), Spain

Search for more papers by this author

Published Online:24 Oct 2019https://doi.org/10.1287/moor.2019.0998

References

[1] Altman E, Stidham S Jr (1995) Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Systems 21(3–4):267–291.Crossref, Google Scholar
[2] Ansell PS, Glazebrook KD, Niño Mora J, O’Keeffe M (2003) Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57:21–39.Crossref, Google Scholar
[3] Avrachenkov KE, Borkar VS (2018) Whittle index policy for crawling ephemeral content. IEEE Trans. Control Network Systems 5(1):446–455.Crossref, Google Scholar
[4] Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.Link, Google Scholar
[5] Caro F, Yoo OS (2010) Indexability of bandit problems with response delays. Probab. Engrg. Inform. Sci. 24(3):349–374.Crossref, Google Scholar
[6] Carter M, van Brunt B (2000) The Lebesgue–Stieltjes Integral: A Practical Introduction (Springer, New York).Crossref, Google Scholar
[7] Coffman EG Jr, Mitrani I (1980) A characterization of waiting time performance realizable by single-server queues. Oper. Res. 28(3-part-ii):810–821.Link, Google Scholar
[8] Cowan W, Katehakis MN (2015) Multi-armed bandits under general depreciation and commitment. Probab. Engrg. Inform. Sci. 29(1):51–76.Crossref, Google Scholar
[9] Dance CR, Silander T (2015) When are Kalman-filter restless bandits indexable? Proc. 28th Internat. Conf. Neural Inform. Processing Systems (NIPS 2015) (MIT Press, Cambridge, MA), 1711–1719.Google Scholar
[10] Dance CR, Silander T (2019) Optimal policies for observing time series and related restless bandit problems. J. Mach. Learn. Res. 20:1–93.Google Scholar
[11] Doob JL (1994) Measure Theory (Springer, New York).Crossref, Google Scholar
[12] Falkner N, Teschl G (2012) On the substitution rule for Lebesgue–Stieltjes integrals. Expositiones Math. 30(4):412–418.Crossref, Google Scholar
[13] Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. B 41(2):148–177.Crossref, Google Scholar
[14] Heilmann WR (1977) Linear programming of dynamic programs with unbounded rewards. Math. Oper. Res. 24:94–105.Google Scholar
[15] Heilmann WR (1979) Solving a general discounted dynamic program by linear programming. Probab. Theory Related Fields 48:339–346.Google Scholar
[16] Hernández-Lerma O, Lasserre JB (1996) Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, New York).Crossref, Google Scholar
[17] Hernández-Lerma O, Lasserre JB (1999) Further Topics on Discrete-Time Markov Control Processes (Springer, New York).Crossref, Google Scholar
[18] Heyman DP, Sobel MJ (1984) Stochastic Models in Operations Research, Vol. II: Stochastic Optimization (McGraw-Hill, New York).Google Scholar
[19] Katehakis MN, Veinott AF Jr (1987) The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 12(2):262–268.Link, Google Scholar
[20] Krishnamurthy V (2016) Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[21] Kuhn J, Nazarathy Y (2017) Wireless channel selection with restless bandits. Boucherie RJ, VanDijk NM, eds. Markov Decision Processes in Practice (Springer, Cham, Switzerland), 463–485.Crossref, Google Scholar
[22] La Scala BF, Moran B (2006) Optimal target tracking with restless bandits. Digital Signal Processing 16(5):479–487.Crossref, Google Scholar
[23] Le Ny J, Dahleh M, Feron E (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Proc. Amer. Control Conf. (IEEE, New York), 4220–4225.Crossref, Google Scholar
[24] Le Ny J, Feron E, Dahleh M (2011) Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control 56(6):1381–1394.Crossref, Google Scholar
[25] Lippman SA (1975) On dynamic programming with unbounded rewards. Management Sci. 21(11):1225–1233.Link, Google Scholar
[26] Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
[27] Meshram R, Manjunath D, Gopalan A (2018) On the Whittle index for restless multi-armed hidden Markov bandits. IEEE Trans. Automatic Control 63(9):3046–3053.Crossref, Google Scholar
[28] Niño-Mora J (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33(1):76–98.Crossref, Google Scholar
[29] Niño-Mora J (2002) Dynamic allocation indices for restless projects and queueing admission control: A polyhedral approach. Math. Programming 93(3):361–413.Crossref, Google Scholar
[30] Niño-Mora J (2006) Restless bandit marginal productivity indices, diminishing returns and optimal control of make-to-order/make-to-stock M/G/1 queues. Math. Oper. Res. 31(1):50–84.Link, Google Scholar
[31] Niño-Mora J (2007) Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15(2):161–198.Crossref, Google Scholar
[32] Niño-Mora J (2008) An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations. Proc. 4th Euro-NGI Conf. Next Generation Internet Networks (NGI 2008) (IEEE, New York), 231–238.Google Scholar
[33] Niño-Mora J (2009) A restless bandit marginal productivity index for opportunistic spectrum access with sensing errors. Núñez-Queija R, Resing J, eds. Proc. 3rd EuroNF Conf. Network Control Optim. (NETCOOP 2009), Lecture Notes in Computer Science, vol. 5894 (Springer, Berlin), 60–74.Google Scholar
[34] Niño-Mora J Conservation laws and related applications. Cochran JJ, Cox LA Jr, Keskinocak P, Kharoufeh JP, Smith JC, eds. Wiley Encyclopedia of Operations Research and Management Science (Wiley, New York).Google Scholar
[35] Niño-Mora J (2012) Admission and routing of soft real-time jobs to multiclusters: Design and comparison of index policies. Comput. Oper. Res. 39(12):3431–3444.Crossref, Google Scholar
[36] Niño-Mora J (2014) A dynamic page-refresh index policy for web crawlers. Sericola B, Telek M, Horváth G, eds. Proc. 21st Internat. Conf. Analytical Stochastic Modelling Tech. Appl. (ASMTA 2014), Lecture Notes in Computer Science, vol. 8499 (Springer, Berlin), 46–60.Google Scholar
[37] Niño-Mora J (2015) A verification theorem for indexability of discrete time real state discounted restless bandits. Preprint, arXiv:1512.04403v1, submitted December 14, https://arxiv.org/abs/1512.04403.Google Scholar
[38] Niño-Mora J (2016) Whittle’s index policy for multi-target tracking with jamming and nondetections. Wittevrongel S, PhungDuc T, eds. Proc. 23rd Internat. Conf. Analytical Stochastic Modelling Tech. Appl. (ASMTA 2016), Lecture Notes in Computer Science, vol. 9845 (Springer, Berlin), 210–222.Google Scholar
[39] Niño-Mora J (2019) Resource allocation and routing in parallel multi-server queues with abandonments for cloud profit maximization. Comput. Oper. Res. 103:221–236.Crossref, Google Scholar
[40] Niño-Mora J, Villar SS (2009) Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time. Proc. Joint 48th IEEE Conf. Decision Control 28th Chinese Control Conf. (CDC/CCC 2009) (IEEE, New York), 2905–2910.Google Scholar
[41] Niño-Mora J, Villar SS (2011) Sensor scheduling for hunting elusive hiding targets via Whittle’s restless bandit index policy. Proc. 5th Internat. Conf. Network Games Control Optim. (NETGCOOP 2011) (IEEE, New York), 124–131.Google Scholar
[42] Niculescu C, Persson LE (2006) Convex Functions and Their Applications: A Contemporary Approach (Springer, New York).Crossref, Google Scholar
[43] Ouyang WZ, Eryilmaz A, Shroff NB (2016) Downlink scheduling over Markovian fading channels. IEEE/ACM Trans. Network 24(3):1801–1812.Crossref, Google Scholar
[44] Ouyang WZ, Murugesan S, Eryilmaz A, Shroff NB (2015) Exploiting channel memory for joint estimation and scheduling in downlink networks—a Whittle’s indexability analysis. IEEE Trans. Inform. Theory 61(4):1702–1719.Crossref, Google Scholar
[45] Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
[46] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Crossref, Google Scholar
[47] Shanthikumar JG, Yao DD (1992) Multiclass queueing systems: Polymatroidal structure and optimal scheduling control. Oper. Res. 40(3-supplement-2):S293–S299.Link, Google Scholar
[48] Taylor JA, Mathieu JL (2014) Index policies for demand response. IEEE Trans. Power Systems 29(3):1287–1295.Crossref, Google Scholar
[49] Topkis DM (1998) Supermodularity and Complementarity (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
[50] Veatch MH, Wein LM (1996) Scheduling a multiclass make-to-stock queue: Index policies and hedging points. Oper. Res. 44(4):634–647.Link, Google Scholar
[51] Verloop IM (2016) Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4):1947–1995.Crossref, Google Scholar
[52] Villar S (2012) Restless Bandit Index Policies for Dynamic Sensor Scheduling Optimization. PhD thesis, Carlos III University of Madrid, Madrid.Google Scholar
[53] Washburn RB (2008) Application of multi-armed bandits to sensor management. Hero AO, Castañón D, Cochran D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 153–175.Crossref, Google Scholar
[54] Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
[55] Wessels J (1977) Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58(2):326–335.Crossref, Google Scholar
[56] Whittle P (1980) Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. B 42:143–149.Google Scholar
[57] Whittle P (1988) Restless bandits: Activity allocation in a changing world. Gani J, ed. A Celebration of Applied Probability: J. Appl. Probab. vol. 25A (Applied Probability Trust, Sheffield, UK), 287–298.Google Scholar
[58] Whittle P (1996) Optimal Control: Basics and Beyond (Wiley, Chichester, UK).Google Scholar

cover image Mathematics of Operations Research

Volume 45, Issue 2

May 2020

Pages 403-795, C2

Article Information

Supplemental Material

Metrics

Information

Received:March 02, 2017
Accepted:February 26, 2019
Published Online:October 24, 2019

Cite as

José Niño-Mora (2019) A Verification Theorem for Threshold-Indexability of Real-State Discounted Restless Bandits. Mathematics of Operations Research 45(2):465-496.

https://doi.org/10.1287/moor.2019.0998

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Verification Theorem for Threshold-Indexability of Real-State Discounted Restless Bandits

References

Volume 45, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News