A Verification Theorem for Threshold-Indexability of Real-State Discounted Restless Bandits
Published Online:24 Oct 2019https://doi.org/10.1287/moor.2019.0998
References
- [1] (1995) Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Systems 21(3–4):267–291.Crossref, Google Scholar
- [2] (2003) Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57:21–39.Crossref, Google Scholar
- [3] (2018) Whittle index policy for crawling ephemeral content. IEEE Trans. Control Network Systems 5(1):446–455.Crossref, Google Scholar
- [4] (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.Link, Google Scholar
- [5] (2010) Indexability of bandit problems with response delays. Probab. Engrg. Inform. Sci. 24(3):349–374.Crossref, Google Scholar
- [6] (2000) The Lebesgue–Stieltjes Integral: A Practical Introduction (Springer, New York).Crossref, Google Scholar
- [7] (1980) A characterization of waiting time performance realizable by single-server queues. Oper. Res. 28(3-part-ii):810–821.Link, Google Scholar
- [8] (2015) Multi-armed bandits under general depreciation and commitment. Probab. Engrg. Inform. Sci. 29(1):51–76.Crossref, Google Scholar
- [9] (2015) When are Kalman-filter restless bandits indexable? Proc. 28th Internat. Conf. Neural Inform. Processing Systems (NIPS 2015) (MIT Press, Cambridge, MA), 1711–1719.Google Scholar
- [10] (2019) Optimal policies for observing time series and related restless bandit problems. J. Mach. Learn. Res. 20:1–93.Google Scholar
- [11] (1994) Measure Theory (Springer, New York).Crossref, Google Scholar
- [12] (2012) On the substitution rule for Lebesgue–Stieltjes integrals. Expositiones Math. 30(4):412–418.Crossref, Google Scholar
- [13] (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. B 41(2):148–177.Crossref, Google Scholar
- [14] (1977) Linear programming of dynamic programs with unbounded rewards. Math. Oper. Res. 24:94–105.Google Scholar
- [15] (1979) Solving a general discounted dynamic program by linear programming. Probab. Theory Related Fields 48:339–346.Google Scholar
- [16] (1996) Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, New York).Crossref, Google Scholar
- [17] (1999) Further Topics on Discrete-Time Markov Control Processes (Springer, New York).Crossref, Google Scholar
- [18] (1984) Stochastic Models in Operations Research, Vol. II: Stochastic Optimization (McGraw-Hill, New York).Google Scholar
- [19] (1987) The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 12(2):262–268.Link, Google Scholar
- [20] (2016) Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [21] (2017) Wireless channel selection with restless bandits. Boucherie RJ, VanDijk NM, eds. Markov Decision Processes in Practice (Springer, Cham, Switzerland), 463–485.Crossref, Google Scholar
- [22] (2006) Optimal target tracking with restless bandits. Digital Signal Processing 16(5):479–487.Crossref, Google Scholar
- [23] (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Proc. Amer. Control Conf. (IEEE, New York), 4220–4225.Crossref, Google Scholar
- [24] (2011) Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control 56(6):1381–1394.Crossref, Google Scholar
- [25] (1975) On dynamic programming with unbounded rewards. Management Sci. 21(11):1225–1233.Link, Google Scholar
- [26] (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
- [27] (2018) On the Whittle index for restless multi-armed hidden Markov bandits. IEEE Trans. Automatic Control 63(9):3046–3053.Crossref, Google Scholar
- [28] (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33(1):76–98.Crossref, Google Scholar
- [29] (2002) Dynamic allocation indices for restless projects and queueing admission control: A polyhedral approach. Math. Programming 93(3):361–413.Crossref, Google Scholar
- [30] (2006) Restless bandit marginal productivity indices, diminishing returns and optimal control of make-to-order/make-to-stock M/G/1 queues. Math. Oper. Res. 31(1):50–84.Link, Google Scholar
- [31] (2007) Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15(2):161–198.Crossref, Google Scholar
- [32] (2008) An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations. Proc. 4th Euro-NGI Conf. Next Generation Internet Networks (NGI 2008) (IEEE, New York), 231–238.Google Scholar
- [33] (2009) A restless bandit marginal productivity index for opportunistic spectrum access with sensing errors. Núñez-Queija R, Resing J, eds. Proc. 3rd EuroNF Conf. Network Control Optim. (NETCOOP 2009), Lecture Notes in Computer Science, vol. 5894 (Springer, Berlin), 60–74.Google Scholar
- [34] Conservation laws and related applications. Cochran JJ, Cox LA Jr, Keskinocak P, Kharoufeh JP, Smith JC, eds. Wiley Encyclopedia of Operations Research and Management Science (Wiley, New York).Google Scholar
- [35] (2012) Admission and routing of soft real-time jobs to multiclusters: Design and comparison of index policies. Comput. Oper. Res. 39(12):3431–3444.Crossref, Google Scholar
- [36] (2014) A dynamic page-refresh index policy for web crawlers. Sericola B, Telek M, Horváth G, eds. Proc. 21st Internat. Conf. Analytical Stochastic Modelling Tech. Appl. (ASMTA 2014), Lecture Notes in Computer Science, vol. 8499 (Springer, Berlin), 46–60.Google Scholar
- [37] (2015) A verification theorem for indexability of discrete time real state discounted restless bandits. Preprint, arXiv:1512.04403v1, submitted December 14, https://arxiv.org/abs/1512.04403.Google Scholar
- [38] (2016) Whittle’s index policy for multi-target tracking with jamming and nondetections. Wittevrongel S, PhungDuc T, eds. Proc. 23rd Internat. Conf. Analytical Stochastic Modelling Tech. Appl. (ASMTA 2016), Lecture Notes in Computer Science, vol. 9845 (Springer, Berlin), 210–222.Google Scholar
- [39] (2019) Resource allocation and routing in parallel multi-server queues with abandonments for cloud profit maximization. Comput. Oper. Res. 103:221–236.Crossref, Google Scholar
- [40] (2009) Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time. Proc. Joint 48th IEEE Conf. Decision Control 28th Chinese Control Conf. (CDC/CCC 2009) (IEEE, New York), 2905–2910.Google Scholar
- [41] (2011) Sensor scheduling for hunting elusive hiding targets via Whittle’s restless bandit index policy. Proc. 5th Internat. Conf. Network Games Control Optim. (NETGCOOP 2011) (IEEE, New York), 124–131.Google Scholar
- [42] (2006) Convex Functions and Their Applications: A Contemporary Approach (Springer, New York).Crossref, Google Scholar
- [43] (2016) Downlink scheduling over Markovian fading channels. IEEE/ACM Trans. Network 24(3):1801–1812.Crossref, Google Scholar
- [44] (2015) Exploiting channel memory for joint estimation and scheduling in downlink networks—a Whittle’s indexability analysis. IEEE Trans. Inform. Theory 61(4):1702–1719.Crossref, Google Scholar
- [45] (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.Link, Google Scholar
- [46] (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).Crossref, Google Scholar
- [47] (1992) Multiclass queueing systems: Polymatroidal structure and optimal scheduling control. Oper. Res. 40(3-supplement-2):S293–S299.Link, Google Scholar
- [48] (2014) Index policies for demand response. IEEE Trans. Power Systems 29(3):1287–1295.Crossref, Google Scholar
- [49] (1998) Supermodularity and Complementarity (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
- [50] (1996) Scheduling a multiclass make-to-stock queue: Index policies and hedging points. Oper. Res. 44(4):634–647.Link, Google Scholar
- [51] (2016) Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4):1947–1995.Crossref, Google Scholar
- [52] (2012) Restless Bandit Index Policies for Dynamic Sensor Scheduling Optimization. PhD thesis, Carlos III University of Madrid, Madrid.Google Scholar
- [53] (2008) Application of multi-armed bandits to sensor management. Hero AO, Castañón D, Cochran D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 153–175.Crossref, Google Scholar
- [54] (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
- [55] (1977) Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58(2):326–335.Crossref, Google Scholar
- [56] (1980) Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. B 42:143–149.Google Scholar
- [57] (1988) Restless bandits: Activity allocation in a changing world. Gani J, ed. A Celebration of Applied Probability: J. Appl. Probab. vol. 25A (Applied Probability Trust, Sheffield, UK), 287–298.Google Scholar
- [58] (1996) Optimal Control: Basics and Beyond (Wiley, Chichester, UK).Google Scholar

