A Sampling-Based Gittins Index Approximation

Stef Baas
Corresponding Author
Stef Baas
[email protected]
https://orcid.org/0000-0002-5890-1165
Stochastic Operations Research Group, Department of Applied Mathematics, University of Twente, 7500 AE Enschede, Netherlands
Search for more papers by this author
,
Richard J. Boucherie
Richard J. Boucherie
[email protected]
https://orcid.org/0000-0002-1046-2044
Stochastic Operations Research Group, Department of Applied Mathematics, University of Twente, 7500 AE Enschede, Netherlands
Search for more papers by this author
,
Aleida Braaksma
Aleida Braaksma
[email protected]
https://orcid.org/0000-0003-2296-0144
Stochastic Operations Research Group, Department of Applied Mathematics, University of Twente, 7500 AE Enschede, Netherlands
Search for more papers by this author

Corresponding Author

Stef Baas

Stochastic Operations Research Group, Department of Applied Mathematics, University of Twente, 7500 AE Enschede, Netherlands

Search for more papers by this author

Richard J. Boucherie

[email protected]

https://orcid.org/0000-0002-1046-2044

Stochastic Operations Research Group, Department of Applied Mathematics, University of Twente, 7500 AE Enschede, Netherlands

Search for more papers by this author

Aleida Braaksma

[email protected]

https://orcid.org/0000-0003-2296-0144

Stochastic Operations Research Group, Department of Applied Mathematics, University of Twente, 7500 AE Enschede, Netherlands

Search for more papers by this author

Published Online:19 Mar 2026https://doi.org/10.1287/moor.2023.0225

References

[1] Aalto S, Ayesta U, Righter R (2011) Properties of the Gittins index with application to optimal scheduling. Probab. Engrg. Inform. Sci. 25(3):269–288.Crossref, Google Scholar
[2] Athreya KB, Lahiri SN (2006) Measure Theory and Probability Theory (Springer, New York).Google Scholar
[3] Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.Crossref, Google Scholar
[4] Boodaghians S, Fusco F, Lazos P, Leonardi S (2022) Pandora’s box problem with order constraints. Math. Oper. Res. 48(1):498–519.Link, Google Scholar
[5] Borkar VS (2008) Stochastic Approximation: A Dynamical Systems Viewpoint (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[6] Brown BM (1971) Martingale central limit theorems. Ann. Math. Statist. 42(1):59–66.Crossref, Google Scholar
[7] Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundation Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
[8] Bubeck S, Perchet V, Rigollet P (2013) Bounded regret in stochastic multi-armed bandits. Shalev-Shwartz S, Steinwart I, eds. Proc. 26th Annual Conf. Learn. Theory, vol. 30 (PMLR, New York), 122–134.Google Scholar
[9] Chakravorty J, Mahajan A (2014) Multi-Armed Bandits, Gittins Index, and Its Calculation (John Wiley & Sons, Hoboken, NY), 416–435.Crossref, Google Scholar
[10] Chen Y, Goldberg DA (2018) Beating the curse of dimensionality in options pricing and optimal stopping. Preprint, submitted August 17, https://arxiv.org/abs/1807.02227v2.Google Scholar
[11] Chopin N (2002) A sequential particle filter method for static models. Biometrika 89(3):539–552.Crossref, Google Scholar
[12] Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.Crossref, Google Scholar
[13] Diaconis P, Ylvisaker D (1979) Conjugate priors for exponential families. Ann. Statist. 7(2):269–281.Crossref, Google Scholar
[14] Edwards J (2019) Practical calculation of Gittins indices for multi-armed bandits. Preprint, submitted September 11, https://arxiv.org/abs/1909.05075.Google Scholar
[15] Farias VF, Gutin E (2022) Optimistic Gittins indices. Oper. Res. 70(6):3432–3456.Link, Google Scholar
[16] Fisher E (1986) An upper class law of the iterated logarithm for supermartingales. Sankhyā A 48(3):267–272.Google Scholar
[17] Fryer R, Harms P (2018) Two-armed restless bandits with imperfect information: Stochastic control and indexability. Math. Oper. Res. 43(2):399–427.Link, Google Scholar
[18] Gast N, Gaujal B, Khun K (2023) Testing indexability and computing Whittle and Gittins index in subcubic time. Math. Methods Oper. Res. 97:391–436.Crossref, Google Scholar
[19] Ghosal S, Van der Vaart A (2017) Fundamentals of Nonparametric Bayesian Inference (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[20] Gilks WR, Richardson S, Spiegelhalter D (1996) Markov Chain Monte Carlo in Practice (CRC Press, Boca Raton, FL).Crossref, Google Scholar
[21] Gittins JC (1979) Bandit processes and dynamic allocation indices. J. Roy. Statist. Soc. Ser. B Methodological 41(2):148–164.Crossref, Google Scholar
[22] Gittins JC, Jones DM (1974) A dynamic allocation index for the sequential design of experiments. Gani J, Sarkadi K, Vincze I, eds. Progress in Statistics (European Meeting of Statisticians, Budapest, 1972) (North Holland, Amsterdam), 241–266.Google Scholar
[23] Gittins J, Glazebrook K, Weber R (2011) Multi-Armed Bandit Allocation Indices (John Wiley & Sons, Hoboken, NY).Crossref, Google Scholar
[24] Glazebrook KD (1980) On randomized dynamic allocation indices for the sequential design of experiments. J. Roy. Statist. Soc. Ser. B Methodological 42(3):342–346.Crossref, Google Scholar
[25] Glazebrook KD (1982) On the evaluation of suboptimal strategies for families of alternative bandit processes. J. Appl. Probab. 19(3):716–722.Crossref, Google Scholar
[26] Glazebrook KD (1983) Optimal strategies for families of alternative bandit processes. IEEE Trans. Automatic Control 28(8):858–861.Crossref, Google Scholar
[27] Glazebrook KD, Minty R (2009) A generalized Gittins index for a class of multiarmed bandits with general resource requirements. Math. Oper. Res. 34(1):26–44.Link, Google Scholar
[28] Kaufmann E (2018) On Bayesian index policies for sequential resource allocation. Ann. Statist. 46(2):842–865.Crossref, Google Scholar
[29] Kaufmann E, Cappé O, Garivier A (2012) On Bayesian upper confidence bounds for bandit problems. Lawrence ND, Girolami M, eds. Proc. Fifteenth Internat. Conf. Artificial Intelligence Statist., vol. 22 (PMLR, New York), 592–600.Google Scholar
[30] Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. Algorithmic Learning Theory (Springer, Heidelberg, Germany), 199–213.Crossref, Google Scholar
[31] Kuleshov V, Precup D (2014) Algorithms for multi-armed bandit problems. Preprint, submitted February 25, https://arxiv.org/abs/1402.6028.Google Scholar
[32] Lai TL (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091–1114.Crossref, Google Scholar
[33] Lai TL, Robbins H (1978) Limit theorems for weighted sums and stochastic approximation processes. Proc. Natl. Acad. Sci. USA 75(3):1068–1070.Crossref, Google Scholar
[34] Lai TL, Robbins H (1979) Adaptive design and stochastic approximation. Ann. Statist. 7(6):1196–1221.Crossref, Google Scholar
[35] Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
[36] Lattimore T (2016) Regret analysis of the finite-horizon Gittins index strategy for multi-armed bandits. Feldman V, Rakhlin A, Shamir O, eds. 29th Annual Conf. Learn. Theory, vol. 49 (PMLR, New York), 1214–1245.Google Scholar
[37] Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[38] Littman ML, Dean TL, Kaelbling LP (2013) On the complexity of solving Markov decision problems. Preprint, submitted February 20, https://arxiv.org/abs/1302.4971.Google Scholar
[39] Mahajan A, Teneketzis D (2008) Multi-armed bandit problems. Hero A, Castañón D, Cochran D, Kastella K, eds. Foundations and Applications of Sensor Management (Springer, New York), 121–151.Crossref, Google Scholar
[40] Neuenschwander B, Weber S, Schmidli H, O’Hagan A (2020) Predictively consistent prior effective sample sizes. Biometrics 76(2):578–587.Crossref, Google Scholar
[41] Niño-Mora J (2011) Computing a classic index for finite-horizon bandits. INFORMS J. Comput. 23(2):254–267.Link, Google Scholar
[42] Peskir G, Shiryaev A (2006) Optimal Stopping and Free-Boundary Problems (Birkhäuser, Basel, Switzerland).Google Scholar
[43] Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming (John Wiley & Sons, Hoboken, NY).Crossref, Google Scholar
[44] Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58(5):527–535.Crossref, Google Scholar
[45] Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
[46] Robertson DS, Lee KM, López-Kolkovska BC, Villar SS (2023) Response-adaptive randomization in clinical trials: From myths to practical considerations. Statist. Sci. 38(2):185–208.Google Scholar
[47] Rosenthal JS (1995) Convergence rates for Markov chains. SIAM Rev. 37(3):387–405.Crossref, Google Scholar
[48] Slivkins A (2019) Introduction to multi-armed bandits. Foundation Trends Machine Learn. 12(1–2):1–286.Crossref, Google Scholar
[49] Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3–4):285–294.Crossref, Google Scholar
[50] Villar SS, Bowden J, Wason J (2015) Multi-armed bandit models for the optimal design of clinical trials: Benefits and challenges. Statist. Sci. 30(2):199–215.Crossref, Google Scholar
[51] Villar SS, Wason J, Bowden J (2015) Response-adaptive randomization for multi-arm clinical trials using the forward looking Gittins index rule. Biometrics 71(4):969–978.Crossref, Google Scholar
[52] Wang YG (1991) Gittins indices and constrained allocation in clinical trials. Biometrika 78(1):101–111.Crossref, Google Scholar
[53] Wang C, Rutledge JJ, Gianola D (1993) Marginal inferences about variance components in a mixed linear model using Gibbs sampling. Genetics Selection Evolution 25(1):41–62.Crossref, Google Scholar
[54] Weber R (1992) On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2(4):1024–1033.Crossref, Google Scholar
[55] Weber R, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
[56] Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Crossref, Google Scholar
[57] Yao YC (2006) Some results on the Gittins index for a normal reward process. IMS Lecture Notes Monograph Ser. 52:284–294.Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:July 21, 2023
Accepted:February 08, 2026
Published Online:March 19, 2026

Cite as

Stef Baas, Richard J. Boucherie, Aleida Braaksma (2026) A Sampling-Based Gittins Index Approximation. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2023.0225

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Sampling-Based Gittins Index Approximation

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News