A Sampling-Based Gittins Index Approximation
Published Online:19 Mar 2026https://doi.org/10.1287/moor.2023.0225
References
- [1] (2011) Properties of the Gittins index with application to optimal scheduling. Probab. Engrg. Inform. Sci. 25(3):269–288.Crossref, Google Scholar
- [2] (2006) Measure Theory and Probability Theory (Springer, New York).Google Scholar
- [3] (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.Crossref, Google Scholar
- [4] (2022) Pandora’s box problem with order constraints. Math. Oper. Res. 48(1):498–519.Link, Google Scholar
- [5] (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [6] (1971) Martingale central limit theorems. Ann. Math. Statist. 42(1):59–66.Crossref, Google Scholar
- [7] (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundation Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- [8] (2013) Bounded regret in stochastic multi-armed bandits. Shalev-Shwartz S, Steinwart I, eds. Proc. 26th Annual Conf. Learn. Theory, vol. 30 (PMLR, New York), 122–134.Google Scholar
- [9] (2014) Multi-Armed Bandits, Gittins Index, and Its Calculation (John Wiley & Sons, Hoboken, NY), 416–435.Crossref, Google Scholar
- [10] (2018) Beating the curse of dimensionality in options pricing and optimal stopping. Preprint, submitted August 17, https://arxiv.org/abs/1807.02227v2.Google Scholar
- [11] (2002) A sequential particle filter method for static models. Biometrika 89(3):539–552.Crossref, Google Scholar
- [12] (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.Crossref, Google Scholar
- [13] (1979) Conjugate priors for exponential families. Ann. Statist. 7(2):269–281.Crossref, Google Scholar
- [14] (2019) Practical calculation of Gittins indices for multi-armed bandits. Preprint, submitted September 11, https://arxiv.org/abs/1909.05075.Google Scholar
- [15] (2022) Optimistic Gittins indices. Oper. Res. 70(6):3432–3456.Link, Google Scholar
- [16] (1986) An upper class law of the iterated logarithm for supermartingales. Sankhyā A 48(3):267–272.Google Scholar
- [17] (2018) Two-armed restless bandits with imperfect information: Stochastic control and indexability. Math. Oper. Res. 43(2):399–427.Link, Google Scholar
- [18] (2023) Testing indexability and computing Whittle and Gittins index in subcubic time. Math. Methods Oper. Res. 97:391–436.Crossref, Google Scholar
- [19] (2017) Fundamentals of Nonparametric Bayesian Inference (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [20] (1996) Markov Chain Monte Carlo in Practice (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- [21] (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B Methodological 41(2):148–164.Crossref, Google Scholar
- [22] (1974) A dynamic allocation index for the sequential design of experiments. Gani J, Sarkadi K, Vincze I, eds. Progress in Statistics (European Meeting of Statisticians, Budapest, 1972) (North Holland, Amsterdam), 241–266.Google Scholar
- [23] (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NY).Crossref, Google Scholar
- [24] (1980) On randomized dynamic allocation indices for the sequential design of experiments. J. Roy. Statist. Soc. Ser. B Methodological 42(3):342–346.Crossref, Google Scholar
- [25] (1982) On the evaluation of suboptimal strategies for families of alternative bandit processes. J. Appl. Probab. 19(3):716–722.Crossref, Google Scholar
- [26] (1983) Optimal strategies for families of alternative bandit processes. IEEE Trans. Automatic Control 28(8):858–861.Crossref, Google Scholar
- [27] (2009) A generalized Gittins index for a class of multiarmed bandits with general resource requirements. Math. Oper. Res. 34(1):26–44.Link, Google Scholar
- [28] (2018) On Bayesian index policies for sequential resource allocation. Ann. Statist. 46(2):842–865.Crossref, Google Scholar
- [29] (2012) On Bayesian upper confidence bounds for bandit problems. Lawrence ND, Girolami M, eds. Proc. Fifteenth Internat. Conf. Artificial Intelligence Statist., vol. 22 (PMLR, New York), 592–600.Google Scholar
- [30] (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. Algorithmic Learning Theory (Springer, Heidelberg, Germany), 199–213.Crossref, Google Scholar
- [31] (2014) Algorithms for multi-armed bandit problems. Preprint, submitted February 25, https://arxiv.org/abs/1402.6028.Google Scholar
- [32] (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091–1114.Crossref, Google Scholar
- [33] (1978) Limit theorems for weighted sums and stochastic approximation processes. Proc. Natl. Acad. Sci. USA 75(3):1068–1070.Crossref, Google Scholar
- [34] (1979) Adaptive design and stochastic approximation. Ann. Statist. 7(6):1196–1221.Crossref, Google Scholar
- [35] (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- [36] (2016) Regret analysis of the finite-horizon Gittins index strategy for multi-armed bandits. Feldman V, Rakhlin A, Shamir O, eds. 29th Annual Conf. Learn. Theory, vol. 49 (PMLR, New York), 1214–1245.Google Scholar
- [37] (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [38] (2013) On the complexity of solving Markov decision problems. Preprint, submitted February 20, https://arxiv.org/abs/1302.4971.Google Scholar
- [39] (2008) Multi-armed bandit problems. Hero A, Castañón D, Cochran D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 121–151.Crossref, Google Scholar
- [40] (2020) Predictively consistent prior effective sample sizes. Biometrics 76(2):578–587.Crossref, Google Scholar
- [41] (2011) Computing a classic index for finite-horizon bandits. INFORMS J. Comput. 23(2):254–267.Link, Google Scholar
- [42] (2006) Optimal Stopping and Free-Boundary Problems (Birkhäuser, Basel, Switzerland).Google Scholar
- [43] (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NY).Crossref, Google Scholar
- [44] (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
- [45] (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
- [46] (2023) Response-adaptive randomization in clinical trials: From myths to practical considerations. Statist. Sci. 38(2):185–208.Google Scholar
- [47] (1995) Convergence rates for Markov chains. SIAM Rev. 37(3):387–405.Crossref, Google Scholar
- [48] (2019) Introduction to multi-armed bandits. Foundation Trends Machine Learn. 12(1–2):1–286.Crossref, Google Scholar
- [49] (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.Crossref, Google Scholar
- [50] (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199–215.Crossref, Google Scholar
- [51] (2015) Response-adaptive randomization for multi-arm clinical trials using the forward looking Gittins index rule. Biometrics 71(4):969–978.Crossref, Google Scholar
- [52] (1991) Gittins indices and constrained allocation in clinical trials. Biometrika 78(1):101–111.Crossref, Google Scholar
- [53] (1993) Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genetics Selection Evolution 25(1):41–62.Crossref, Google Scholar
- [54] (1992) On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2(4):1024–1033.Crossref, Google Scholar
- [55] (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
- [56] (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Crossref, Google Scholar
- [57] (2006) Some results on the Gittins index for a normal reward process. IMS Lecture Notes Monograph Ser. 52:284–294.Google Scholar

