A Sampling-Based Gittins Index Approximation

Published Online:https://doi.org/10.1287/moor.2023.0225

References

  • [1] Aalto S, Ayesta U, Righter R (2011) Properties of the Gittins index with application to optimal scheduling. Probab. Engrg. Inform. Sci. 25(3):269–288.CrossrefGoogle Scholar
  • [2] Athreya KB, Lahiri SN (2006) Measure Theory and Probability Theory (Springer, New York).Google Scholar
  • [3] Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.CrossrefGoogle Scholar
  • [4] Boodaghians S, Fusco F, Lazos P, Leonardi S (2022) Pandora’s box problem with order constraints. Math. Oper. Res. 48(1):498–519.LinkGoogle Scholar
  • [5] Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [6] Brown BM (1971) Martingale central limit theorems. Ann. Math. Statist. 42(1):59–66.CrossrefGoogle Scholar
  • [7] Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundation Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • [8] Bubeck S, Perchet V, Rigollet P (2013) Bounded regret in stochastic multi-armed bandits. Shalev-Shwartz S, Steinwart I, eds. Proc. 26th Annual Conf. Learn. Theory, vol. 30 (PMLR, New York), 122–134.Google Scholar
  • [9] Chakravorty J, Mahajan A (2014) Multi-Armed Bandits, Gittins Index, and Its Calculation (John Wiley & Sons, Hoboken, NY), 416–435.CrossrefGoogle Scholar
  • [10] Chen Y, Goldberg DA (2018) Beating the curse of dimensionality in options pricing and optimal stopping. Preprint, submitted August 17, https://arxiv.org/abs/1807.02227v2.Google Scholar
  • [11] Chopin N (2002) A sequential particle filter method for static models. Biometrika 89(3):539–552.CrossrefGoogle Scholar
  • [12] Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.CrossrefGoogle Scholar
  • [13] Diaconis P, Ylvisaker D (1979) Conjugate priors for exponential families. Ann. Statist. 7(2):269–281.CrossrefGoogle Scholar
  • [14] Edwards J (2019) Practical calculation of Gittins indices for multi-armed bandits. Preprint, submitted September 11, https://arxiv.org/abs/1909.05075.Google Scholar
  • [15] Farias VF, Gutin E (2022) Optimistic Gittins indices. Oper. Res. 70(6):3432–3456.LinkGoogle Scholar
  • [16] Fisher E (1986) An upper class law of the iterated logarithm for supermartingales. Sankhyā A 48(3):267–272.Google Scholar
  • [17] Fryer R, Harms P (2018) Two-armed restless bandits with imperfect information: Stochastic control and indexability. Math. Oper. Res. 43(2):399–427.LinkGoogle Scholar
  • [18] Gast N, Gaujal B, Khun K (2023) Testing indexability and computing Whittle and Gittins index in subcubic time. Math. Methods Oper. Res. 97:391–436.CrossrefGoogle Scholar
  • [19] Ghosal S, Van der Vaart A (2017) Fundamentals of Nonparametric Bayesian Inference (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [20] Gilks WR, Richardson S, Spiegelhalter D (1996) Markov Chain Monte Carlo in Practice (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • [21] Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B Methodological 41(2):148–164.CrossrefGoogle Scholar
  • [22] Gittins JC, Jones DM (1974) A dynamic allocation index for the sequential design of experiments. Gani J, Sarkadi K, Vincze I, eds. Progress in Statistics (European Meeting of Statisticians, Budapest, 1972) (North Holland, Amsterdam), 241–266.Google Scholar
  • [23] Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NY).CrossrefGoogle Scholar
  • [24] Glazebrook KD (1980) On randomized dynamic allocation indices for the sequential design of experiments. J. Roy. Statist. Soc. Ser. B Methodological 42(3):342–346.CrossrefGoogle Scholar
  • [25] Glazebrook KD (1982) On the evaluation of suboptimal strategies for families of alternative bandit processes. J. Appl. Probab. 19(3):716–722.CrossrefGoogle Scholar
  • [26] Glazebrook KD (1983) Optimal strategies for families of alternative bandit processes. IEEE Trans. Automatic Control 28(8):858–861.CrossrefGoogle Scholar
  • [27] Glazebrook KD, Minty R (2009) A generalized Gittins index for a class of multiarmed bandits with general resource requirements. Math. Oper. Res. 34(1):26–44.LinkGoogle Scholar
  • [28] Kaufmann E (2018) On Bayesian index policies for sequential resource allocation. Ann. Statist. 46(2):842–865.CrossrefGoogle Scholar
  • [29] Kaufmann E, Cappé O, Garivier A (2012) On Bayesian upper confidence bounds for bandit problems. Lawrence ND, Girolami M, eds. Proc. Fifteenth Internat. Conf. Artificial Intelligence Statist., vol. 22 (PMLR, New York), 592–600.Google Scholar
  • [30] Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. Algorithmic Learning Theory (Springer, Heidelberg, Germany), 199–213.CrossrefGoogle Scholar
  • [31] Kuleshov V, Precup D (2014) Algorithms for multi-armed bandit problems. Preprint, submitted February 25, https://arxiv.org/abs/1402.6028.Google Scholar
  • [32] Lai TL (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091–1114.CrossrefGoogle Scholar
  • [33] Lai TL, Robbins H (1978) Limit theorems for weighted sums and stochastic approximation processes. Proc. Natl. Acad. Sci. USA 75(3):1068–1070.CrossrefGoogle Scholar
  • [34] Lai TL, Robbins H (1979) Adaptive design and stochastic approximation. Ann. Statist. 7(6):1196–1221.CrossrefGoogle Scholar
  • [35] Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • [36] Lattimore T (2016) Regret analysis of the finite-horizon Gittins index strategy for multi-armed bandits. Feldman V, Rakhlin A, Shamir O, eds. 29th Annual Conf. Learn. Theory, vol. 49 (PMLR, New York), 1214–1245.Google Scholar
  • [37] Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [38] Littman ML, Dean TL, Kaelbling LP (2013) On the complexity of solving Markov decision problems. Preprint, submitted February 20, https://arxiv.org/abs/1302.4971.Google Scholar
  • [39] Mahajan A, Teneketzis D (2008) Multi-armed bandit problems. Hero A, Castañón D, Cochran D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 121–151.CrossrefGoogle Scholar
  • [40] Neuenschwander B, Weber S, Schmidli H, O’Hagan A (2020) Predictively consistent prior effective sample sizes. Biometrics 76(2):578–587.CrossrefGoogle Scholar
  • [41] Niño-Mora J (2011) Computing a classic index for finite-horizon bandits. INFORMS J. Comput. 23(2):254–267.LinkGoogle Scholar
  • [42] Peskir G, Shiryaev A (2006) Optimal Stopping and Free-Boundary Problems (Birkhäuser, Basel, Switzerland).Google Scholar
  • [43] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NY).CrossrefGoogle Scholar
  • [44] Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.CrossrefGoogle Scholar
  • [45] Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.CrossrefGoogle Scholar
  • [46] Robertson DS, Lee KM, López-Kolkovska BC, Villar SS (2023) Response-adaptive randomization in clinical trials: From myths to practical considerations. Statist. Sci. 38(2):185–208.Google Scholar
  • [47] Rosenthal JS (1995) Convergence rates for Markov chains. SIAM Rev. 37(3):387–405.CrossrefGoogle Scholar
  • [48] Slivkins A (2019) Introduction to multi-armed bandits. Foundation Trends Machine Learn. 12(1–2):1–286.CrossrefGoogle Scholar
  • [49] Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.CrossrefGoogle Scholar
  • [50] Villar SS, Bowden J, Wason J (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199–215.CrossrefGoogle Scholar
  • [51] Villar SS, Wason J, Bowden J (2015) Response-adaptive randomization for multi-arm clinical trials using the forward looking Gittins index rule. Biometrics 71(4):969–978.CrossrefGoogle Scholar
  • [52] Wang YG (1991) Gittins indices and constrained allocation in clinical trials. Biometrika 78(1):101–111.CrossrefGoogle Scholar
  • [53] Wang C, Rutledge JJ, Gianola D (1993) Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genetics Selection Evolution 25(1):41–62.CrossrefGoogle Scholar
  • [54] Weber R (1992) On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2(4):1024–1033.CrossrefGoogle Scholar
  • [55] Weber R, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.CrossrefGoogle Scholar
  • [56] Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.CrossrefGoogle Scholar
  • [57] Yao YC (2006) Some results on the Gittins index for a normal reward process. IMS Lecture Notes Monograph Ser. 52:284–294.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.