Rate-Optimal Bayesian Simple Regret in Best Arm Identification
References
- [1] (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Conf. Learn. Theory (JMLR.org), 39.1–39.26.Google Scholar
- [2] (2010) Best arm identification in multi-armed bandits. Kalai AT, Mohri M, eds. Conf. Learn. Theory (Omnipress Inc., Pensylvania), 41–53.Google Scholar
- [3] (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.Crossref, Google Scholar
- [4] (2011) Pure exploration in finitely-armed and continuous-armed bandits. Theoretical Comput. Sci. 412(19):1832–1852.Crossref, Google Scholar
- [5] (2011) Convergence rates of efficient global optimization algorithms. J. Machine Learn. Res. 12(88):2879–2904.Google Scholar
- [6] (2016) Tight (lower) bounds for the fixed budget best arm identification bandit problem. Feldman V, Rakhlin A, Shamir O, eds. Conf. Learn. Theory (JMLR.org), 590–604.Google Scholar
- [7] (2000) Simulation budget allocation for further enhancing theefficiency of ordinal optimization. Discrete Event Dynamic Systems 10(3):251–270.Crossref, Google Scholar
- [8] (1959) Sequential design of experiments. Ann. Math. Statist. 30(3):755–770.Crossref, Google Scholar
- [9] (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Machine Learn. Res. 7:1079–1105.Google Scholar
- [10] (2018) Bayesian optimization. Recent advances in optimization and modeling of contemporary problems. INFORMS TutORials in Operations Research, 255–278.Google Scholar
- [11] (1989) Multi-armed Bandit Allocation Indices (Wiley, Chichester, NY).Google Scholar
- [12] (2004) A large deviations perspective on ordinal optimization. Conf. Winter Simulation (IEEE Computer Society), 577–585.Google Scholar
- [13] (2008) A Bayesian learning automaton for solving two-armed Bernoulli bandit problems. Bramer M, Petridis M, Coenen F, eds. SGAI Internat. Conf. Artificial Intelligence (Springer, London), 23–30.Google Scholar
- [14] (1992) Ordinal optimization of DEDS. Discrete Event Dynamic Systems 2:61–88.Crossref, Google Scholar
- [15] (2015) Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards. J. Machine Learn. Res. 16(113):3721–3756.Google Scholar
- [16] (2021) Review on ranking and selection: A new perspective. Frontiers Engrg. Management 8(3):321–343.Crossref, Google Scholar
- [17] (2014) Analysis of Bayesian and frequentist strategies for sequential resource allocation. PhD thesis, Telecom ParisTech, France.Google Scholar
- [18] (2016) On the complexity of best-arm identification in multi-armed bandit models. J. Machine Learn. Res. 17(1):1–42.Google Scholar
- [19] (2012) Thompson sampling: An asymptotically optimal finite-time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds. Algorithmic Learn. Theory (Springer, Berlin, Germany), 199–213.Crossref, Google Scholar
- [20] (2022) Minimax optimal algorithms for fixed-budget best arm identification. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY).Google Scholar
- [21] (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091–1114.Crossref, Google Scholar
- [22] (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- [23] (1997) The racing algorithm: Model selection for lazy learners. Artificial Intelligence Rev. 11(1):193–225.Crossref, Google Scholar
- [24] (1964) A sequential procedure for selecting the population with the largest mean from k normal populations. Ann. Math. Statist. 35(1):174–180.Crossref, Google Scholar
- [25] (2016) Dynamic sampling allocation and design selection. INFORMS J. Comput. 28(2):195–208.Link, Google Scholar
- [26] (2022) Adaptivity and confounding in multi-armed bandit experiments. CoRRabs/2202.09036, URL https://arxiv.org/abs/2202.09036.Google Scholar
- [27] (2017) Improving the expected improvement algorithm. Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R, eds. Adv. Neural Inform. Processing Systems 30 (Curran Associates, Inc., Red Hook, NY) 5381–5391.Google Scholar
- [28] (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.Crossref, Google Scholar
- [29] (2020) Simple Bayesian algorithms for best-arm identification. Oper. Res. 68(6):1625–1647.Link, Google Scholar
- [30] (2021) Technical note—A note on the equivalence of upper confidence bounds and Gittins indices for patient agents. Oper. Res. 69(1):273–278.Link, Google Scholar
- [31] (2018) Learning to optimize via information-directed sampling. Oper. Res. 66(1):230–252.Link, Google Scholar
- [32] (2016) On the convergence rates of expected improvement methods. Oper. Res. 64(6):1515–1528.Link, Google Scholar
- [33] (2017) On sequential elimination algorithms for best-arm identification in multi-armed bandits. IEEE Trans. Signal Processing 65(16):4281–4292.Crossref, Google Scholar
- [34] (2020) Fixed-confidence guarantees for Bayesian best-arm identification. Chiappa S, Calandra R, eds. Internat. Conf. Artificial Intelligence Statist. (JMLR.org), vol. 108, 1823–1832.Google Scholar
- [35] (2010) Gaussian process optimization in the bandit setting: No regret and experimental design. Furnkranz J, Joachims T, eds. Internat. Conf. Machine Learn. (JMLR.org), 1015–1022.Google Scholar
- [36] (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- [37] (2021) Optimal order simple regret for Gaussian process bandits. Ranzato M, Beygelzimer A, Dauphin YN, Liang P, Vaughan JW, eds. Adv. Conf. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 21202–21215.Google Scholar
- [38] (1992) On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2(4):1024–1033.Crossref, Google Scholar

