A Verification Theorem for Threshold-Indexability of Real-State Discounted Restless Bandits

Published Online:https://doi.org/10.1287/moor.2019.0998

References

  • [1] Altman E, Stidham S Jr (1995) Optimality of monotonic policies for two-action Markovian decision processes, with applications to control of queues with delayed information. Queueing Systems 21(3–4):267–291.CrossrefGoogle Scholar
  • [2] Ansell PS, Glazebrook KD, Niño Mora J, O’Keeffe M (2003) Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57:21–39.CrossrefGoogle Scholar
  • [3] Avrachenkov KE, Borkar VS (2018) Whittle index policy for crawling ephemeral content. IEEE Trans. Control Network Systems 5(1):446–455.CrossrefGoogle Scholar
  • [4] Bertsimas D, Niño-Mora J (1996) Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21(2):257–306.LinkGoogle Scholar
  • [5] Caro F, Yoo OS (2010) Indexability of bandit problems with response delays. Probab. Engrg. Inform. Sci. 24(3):349–374.CrossrefGoogle Scholar
  • [6] Carter M, van Brunt B (2000) The Lebesgue–Stieltjes Integral: A Practical Introduction (Springer, New York).CrossrefGoogle Scholar
  • [7] Coffman EG Jr, Mitrani I (1980) A characterization of waiting time performance realizable by single-server queues. Oper. Res. 28(3-part-ii):810–821.LinkGoogle Scholar
  • [8] Cowan W, Katehakis MN (2015) Multi-armed bandits under general depreciation and commitment. Probab. Engrg. Inform. Sci. 29(1):51–76.CrossrefGoogle Scholar
  • [9] Dance CR, Silander T (2015) When are Kalman-filter restless bandits indexable? Proc. 28th Internat. Conf. Neural Inform. Processing Systems (NIPS 2015) (MIT Press, Cambridge, MA), 1711–1719.Google Scholar
  • [10] Dance CR, Silander T (2019) Optimal policies for observing time series and related restless bandit problems. J. Mach. Learn. Res. 20:1–93.Google Scholar
  • [11] Doob JL (1994) Measure Theory (Springer, New York).CrossrefGoogle Scholar
  • [12] Falkner N, Teschl G (2012) On the substitution rule for Lebesgue–Stieltjes integrals. Expositiones Math. 30(4):412–418.CrossrefGoogle Scholar
  • [13] Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. B 41(2):148–177.CrossrefGoogle Scholar
  • [14] Heilmann WR (1977) Linear programming of dynamic programs with unbounded rewards. Math. Oper. Res. 24:94–105.Google Scholar
  • [15] Heilmann WR (1979) Solving a general discounted dynamic program by linear programming. Probab. Theory Related Fields 48:339–346.Google Scholar
  • [16] Hernández-Lerma O, Lasserre JB (1996) Discrete-Time Markov Control Processes: Basic Optimality Criteria (Springer, New York).CrossrefGoogle Scholar
  • [17] Hernández-Lerma O, Lasserre JB (1999) Further Topics on Discrete-Time Markov Control Processes (Springer, New York).CrossrefGoogle Scholar
  • [18] Heyman DP, Sobel MJ (1984) Stochastic Models in Operations Research, Vol. II: Stochastic Optimization (McGraw-Hill, New York).Google Scholar
  • [19] Katehakis MN, Veinott AF Jr (1987) The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 12(2):262–268.LinkGoogle Scholar
  • [20] Krishnamurthy V (2016) Partially Observed Markov Decision Processes: From Filtering to Controlled Sensing (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [21] Kuhn J, Nazarathy Y (2017) Wireless channel selection with restless bandits. Boucherie RJ, VanDijk NM, eds. Markov Decision Processes in Practice (Springer, Cham, Switzerland), 463–485.CrossrefGoogle Scholar
  • [22] La Scala BF, Moran B (2006) Optimal target tracking with restless bandits. Digital Signal Processing 16(5):479–487.CrossrefGoogle Scholar
  • [23] Le Ny J, Dahleh M, Feron E (2008) Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. Proc. Amer. Control Conf. (IEEE, New York), 4220–4225.CrossrefGoogle Scholar
  • [24] Le Ny J, Feron E, Dahleh M (2011) Scheduling continuous-time Kalman filters. IEEE Trans. Automatic Control 56(6):1381–1394.CrossrefGoogle Scholar
  • [25] Lippman SA (1975) On dynamic programming with unbounded rewards. Management Sci. 21(11):1225–1233.LinkGoogle Scholar
  • [26] Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.CrossrefGoogle Scholar
  • [27] Meshram R, Manjunath D, Gopalan A (2018) On the Whittle index for restless multi-armed hidden Markov bandits. IEEE Trans. Automatic Control 63(9):3046–3053.CrossrefGoogle Scholar
  • [28] Niño-Mora J (2001) Restless bandits, partial conservation laws and indexability. Adv. Appl. Probab. 33(1):76–98.CrossrefGoogle Scholar
  • [29] Niño-Mora J (2002) Dynamic allocation indices for restless projects and queueing admission control: A polyhedral approach. Math. Programming 93(3):361–413.CrossrefGoogle Scholar
  • [30] Niño-Mora J (2006) Restless bandit marginal productivity indices, diminishing returns and optimal control of make-to-order/make-to-stock M/G/1 queues. Math. Oper. Res. 31(1):50–84.LinkGoogle Scholar
  • [31] Niño-Mora J (2007) Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15(2):161–198.CrossrefGoogle Scholar
  • [32] Niño-Mora J (2008) An index policy for dynamic fading-channel allocation to heterogeneous mobile users with partial observations. Proc. 4th Euro-NGI Conf. Next Generation Internet Networks (NGI 2008) (IEEE, New York), 231–238.Google Scholar
  • [33] Niño-Mora J (2009) A restless bandit marginal productivity index for opportunistic spectrum access with sensing errors. Núñez-Queija R, Resing J, eds. Proc. 3rd EuroNF Conf. Network Control Optim. (NETCOOP 2009), Lecture Notes in Computer Science, vol. 5894 (Springer, Berlin), 60–74.Google Scholar
  • [34] Niño-Mora J Conservation laws and related applications. Cochran JJ, Cox LA Jr, Keskinocak P, Kharoufeh JP, Smith JC, eds. Wiley Encyclopedia of Operations Research and Management Science (Wiley, New York).Google Scholar
  • [35] Niño-Mora J (2012) Admission and routing of soft real-time jobs to multiclusters: Design and comparison of index policies. Comput. Oper. Res. 39(12):3431–3444.CrossrefGoogle Scholar
  • [36] Niño-Mora J (2014) A dynamic page-refresh index policy for web crawlers. Sericola B, Telek M, Horváth G, eds. Proc. 21st Internat. Conf. Analytical Stochastic Modelling Tech. Appl. (ASMTA 2014), Lecture Notes in Computer Science, vol. 8499 (Springer, Berlin), 46–60.Google Scholar
  • [37] Niño-Mora J (2015) A verification theorem for indexability of discrete time real state discounted restless bandits. Preprint, arXiv:1512.04403v1, submitted December 14, https://arxiv.org/abs/1512.04403.Google Scholar
  • [38] Niño-Mora J (2016) Whittle’s index policy for multi-target tracking with jamming and nondetections. Wittevrongel S, PhungDuc T, eds. Proc. 23rd Internat. Conf. Analytical Stochastic Modelling Tech. Appl. (ASMTA 2016), Lecture Notes in Computer Science, vol. 9845 (Springer, Berlin), 210–222.Google Scholar
  • [39] Niño-Mora J (2019) Resource allocation and routing in parallel multi-server queues with abandonments for cloud profit maximization. Comput. Oper. Res. 103:221–236.CrossrefGoogle Scholar
  • [40] Niño-Mora J, Villar SS (2009) Multitarget tracking via restless bandit marginal productivity indices and Kalman filter in discrete time. Proc. Joint 48th IEEE Conf. Decision Control 28th Chinese Control Conf. (CDC/CCC 2009) (IEEE, New York), 2905–2910.Google Scholar
  • [41] Niño-Mora J, Villar SS (2011) Sensor scheduling for hunting elusive hiding targets via Whittle’s restless bandit index policy. Proc. 5th Internat. Conf. Network Games Control Optim. (NETGCOOP 2011) (IEEE, New York), 124–131.Google Scholar
  • [42] Niculescu C, Persson LE (2006) Convex Functions and Their Applications: A Contemporary Approach (Springer, New York).CrossrefGoogle Scholar
  • [43] Ouyang WZ, Eryilmaz A, Shroff NB (2016) Downlink scheduling over Markovian fading channels. IEEE/ACM Trans. Network 24(3):1801–1812.CrossrefGoogle Scholar
  • [44] Ouyang WZ, Murugesan S, Eryilmaz A, Shroff NB (2015) Exploiting channel memory for joint estimation and scheduling in downlink networks—a Whittle’s indexability analysis. IEEE Trans. Inform. Theory 61(4):1702–1719.CrossrefGoogle Scholar
  • [45] Papadimitriou CH, Tsitsiklis JN (1999) The complexity of optimal queuing network control. Math. Oper. Res. 24(2):293–305.LinkGoogle Scholar
  • [46] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (Wiley, New York).CrossrefGoogle Scholar
  • [47] Shanthikumar JG, Yao DD (1992) Multiclass queueing systems: Polymatroidal structure and optimal scheduling control. Oper. Res. 40(3-supplement-2):S293–S299.LinkGoogle Scholar
  • [48] Taylor JA, Mathieu JL (2014) Index policies for demand response. IEEE Trans. Power Systems 29(3):1287–1295.CrossrefGoogle Scholar
  • [49] Topkis DM (1998) Supermodularity and Complementarity (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • [50] Veatch MH, Wein LM (1996) Scheduling a multiclass make-to-stock queue: Index policies and hedging points. Oper. Res. 44(4):634–647.LinkGoogle Scholar
  • [51] Verloop IM (2016) Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 26(4):1947–1995.CrossrefGoogle Scholar
  • [52] Villar S (2012) Restless Bandit Index Policies for Dynamic Sensor Scheduling Optimization. PhD thesis, Carlos III University of Madrid, Madrid.Google Scholar
  • [53] Washburn RB (2008) Application of multi-armed bandits to sensor management. Hero AO, Castañón D, Cochran D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 153–175.CrossrefGoogle Scholar
  • [54] Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.CrossrefGoogle Scholar
  • [55] Wessels J (1977) Markov programming by successive approximations with respect to weighted supremum norms. J. Math. Anal. Appl. 58(2):326–335.CrossrefGoogle Scholar
  • [56] Whittle P (1980) Multi-armed bandits and the Gittins index. J. Roy. Statist. Soc. B 42:143–149.Google Scholar
  • [57] Whittle P (1988) Restless bandits: Activity allocation in a changing world. Gani J, ed. A Celebration of Applied Probability: J. Appl. Probab. vol. 25A (Applied Probability Trust, Sheffield, UK), 287–298.Google Scholar
  • [58] Whittle P (1996) Optimal Control: Basics and Beyond (Wiley, Chichester, UK).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.