Rate-Optimal Bayesian Simple Regret in Best Arm Identification

Published Online:https://doi.org/10.1287/moor.2022.0011

References

  • [1] Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Conf. Learn. Theory (JMLR.org), 39.1–39.26.Google Scholar
  • [2] Audibert J, Bubeck S, Munos R (2010) Best arm identification in multi-armed bandits. Kalai AT, Mohri M, eds. Conf. Learn. Theory (Omnipress Inc., Pensylvania), 41–53.Google Scholar
  • [3] Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.CrossrefGoogle Scholar
  • [4] Bubeck S, Munos R, Stoltz G (2011) Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Comput. Sci. 412(19):1832–1852.CrossrefGoogle Scholar
  • [5] Bull AD (2011) Convergence rates of efficient global optimization algorithms. J. Machine Learn. Res. 12(88):2879–2904.Google Scholar
  • [6] Carpentier A, Locatelli A (2016) Tight (lower) bounds for the fixed budget best arm identification bandit problem. Feldman V, Rakhlin A, Shamir O, eds. Conf. Learn. Theory (JMLR.org), 590–604.Google Scholar
  • [7] Chen CH, Lin J, Yücesan E, Chick SE (2000) Simulation budget allocation for further enhancing theefficiency of ordinal optimization. Discrete Event Dynamic Systems 10(3):251–270.CrossrefGoogle Scholar
  • [8] Chernoff H (1959) Sequential design of experiments. Ann. Math. Statist. 30(3):755–770.CrossrefGoogle Scholar
  • [9] Even-Dar E, Mannor S, Mansour Y, Mahadevan S (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Machine Learn. Res. 7:1079–1105.Google Scholar
  • [10] Frazier PI (2018) Bayesian optimization. Recent advances in optimization and modeling of contemporary problems. INFORMS TutORials in Operations Research, 255–278.Google Scholar
  • [11] Gittins JC (1989) Multi-armed Bandit Allocation Indices (Wiley, Chichester, NY).Google Scholar
  • [12] Glynn PW, Juneja S (2004) A large deviations perspective on ordinal optimization. Conf. Winter Simulation (IEEE Computer Society), 577–585.Google Scholar
  • [13] Granmo OC (2008) A Bayesian learning automaton for solving two-armed Bernoulli bandit problems. Bramer M, Petridis M, Coenen F, eds. SGAI Internat. Conf. Artificial Intelligence (Springer, London), 23–30.Google Scholar
  • [14] Ho YC, Sreenivas R, Vakili P (1992) Ordinal optimization of DEDS. Discrete Event Dynamic Systems 2:61–88.CrossrefGoogle Scholar
  • [15] Honda J, Takemura A (2015) Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards. J. Machine Learn. Res. 16(113):3721–3756.Google Scholar
  • [16] Hong LJ, Fan W, Luo J (2021) Review on ranking and selection: A new perspective. Frontiers Engrg. Management 8(3):321–343.CrossrefGoogle Scholar
  • [17] Kaufmann E (2014) Analysis of Bayesian and frequentist strategies for sequential resource allocation. PhD thesis, Telecom ParisTech, France.Google Scholar
  • [18] Kaufmann E, Cappé O, Garivier A (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learn. Res. 17(1):1–42.Google Scholar
  • [19] Kaufmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. Algorithmic Learn. Theory (Springer, Berlin, Germany), 199–213.CrossrefGoogle Scholar
  • [20] Komiyama J, Tsuchiya T, Honda J (2022) Minimax optimal algorithms for fixed-budget best arm identification. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
  • [21] Lai TL (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091–1114.CrossrefGoogle Scholar
  • [22] Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • [23] Maron O, Moore AW (1997) The racing algorithm: Model selection for lazy learners. Artificial Intelligence Rev. 11(1):193–225.CrossrefGoogle Scholar
  • [24] Paulson E (1964) A sequential procedure for selecting the population with the largest mean from k normal populations. Ann. Math. Statist. 35(1):174–180.CrossrefGoogle Scholar
  • [25] Peng Y, Chen CH, Fu MC, Hu JQ (2016) Dynamic sampling allocation and design selection. INFORMS J. Comput. 28(2):195–208.LinkGoogle Scholar
  • [26] Qin C, Russo D (2022) Adaptivity and confounding in multi-armed bandit experiments. CoRRabs/2202.09036, URL https://arxiv.org/abs/2202.09036.Google Scholar
  • [27] Qin C, Klabjan D, Russo D (2017) Improving the expected improvement algorithm. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R, eds. Adv. Neural Inform. Processing Systems 30 (Curran Associates, Inc., Red Hook, NY) 5381–5391.Google Scholar
  • [28] Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.CrossrefGoogle Scholar
  • [29] Russo D (2020) Simple Bayesian algorithms for best-arm identification. Oper. Res. 68(6):1625–1647.LinkGoogle Scholar
  • [30] Russo D (2021) Technical note—A note on the equivalence of upper confidence bounds and Gittins indices for patient agents. Oper. Res. 69(1):273–278.LinkGoogle Scholar
  • [31] Russo D, Van Roy B (2018) Learning to optimize via information-directed sampling. Oper. Res. 66(1):230–252.LinkGoogle Scholar
  • [32] Ryzhov IO (2016) On the convergence rates of expected improvement methods. Oper. Res. 64(6):1515–1528.LinkGoogle Scholar
  • [33] Shahrampour S, Noshad M, Tarokh V (2017) On sequential elimination algorithms for best-arm identification in multi-armed bandits. IEEE Trans. Signal Processing 65(16):4281–4292.CrossrefGoogle Scholar
  • [34] Shang X, de Heide R, Ménard P, Kaufmann E, Valko M (2020) Fixed-confidence guarantees for Bayesian best-arm identification. Chiappa S, Calandra R, eds. Internat. Conf. Artificial Intelligence Statist. (JMLR.org), vol. 108, 1823–1832.Google Scholar
  • [35] Srinivas N, Krause A, Kakade SM, Seeger MW (2010) Gaussian process optimization in the bandit setting: No regret and experimental design. Furnkranz J, Joachims T, eds. Internat. Conf. Machine Learn. (JMLR.org), 1015–1022.Google Scholar
  • [36] Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • [37] Vakili S, Bouziani N, Jalali S, Bernacchia A, Shiu D-s (2021) Optimal order simple regret for Gaussian process bandits. Ranzato M, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW, eds. Adv. Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 21202–21215.Google Scholar
  • [38] Weber R (1992) On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2(4):1024–1033.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.