Dual-Directed Algorithm Design for Efficient Pure Exploration

Published Online:https://doi.org/10.1287/opre.2023.0590

References

  • Agrawal S, Juneja S, Glynn P (2020) Optimal δ-correct best-arm selection for heavy-tailed distributions. Kontorovich A, Neu G, eds. Proc. 31st Internat. Conf. Algorithmic Learn. Theory, vol. 117 (PMLR, Cambridge, MA), 61–110.Google Scholar
  • Al Marjani A, Kocak T, Garivier A (2022) On the complexity of all ε-best arms identification. Amini M-R, Canu S, Fischer A, Guns T, Novak PK, Tsoumakas G, eds. Proc. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases (Springer, Switzerland), 317–332.Google Scholar
  • Apostol TM (1969) Calculus, vol. 2, 2nd ed. (John Wiley & Sons, New York).Google Scholar
  • Avci H, Nelson BL, Song E, Wächter A (2023) Using cache or credit for parallel ranking and selection. ACM Trans. Modeling Comput. Simulations 33(4):1–28.CrossrefGoogle Scholar
  • Bandyopadhyay A, Juneja SK, Agrawal S (2024) Optimal top two method for best arm identification and fluid analysis. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. Proc. 38th Ann. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Bechhofer RE (1954) A single-sample multiple decision procedure for ranking means of normal populations with known variances. Ann. Math. Statist. 25(1):16–39.CrossrefGoogle Scholar
  • Bubeck S, Wang T, Viswanathan N (2013) Multiple identifications in multi-armed bandits. Proc. Internat. Conf. Machine Learn. (JMLR, Cambridge, MA), 258–265.Google Scholar
  • Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Proc. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Chen Y, Ryzhov IO (2023) Balancing optimal large deviations in sequential selection. Management Sci. 69(6):3457–3473.LinkGoogle Scholar
  • Chen CH, He D, Fu M, Lee LH (2008) Efficient simulation budget allocation for selecting an optimal subset. INFORMS J. Comput. 20(4):579–595.LinkGoogle Scholar
  • Chen CH, Lin J, Yücesan E, Chick SE (2000) Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dynamic Systems 10:251–270.CrossrefGoogle Scholar
  • Chernoff H (1952) Sequential design of experiments. Ann. Math. Statist. 30(3):755–770.CrossrefGoogle Scholar
  • Cover TM, Thomas JA (2006) Elements of Information Theory (John Wiley & Sons, Inc., Hoboken, NJ).Google Scholar
  • Degenne R, Koolen WM (2019) Pure exploration with multiple correct answers. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 14591–14600.Google Scholar
  • Eckman DJ, Henderson SG (2018) Guarantees on the probability of good selection. Proc. Winter Simulation Conf. (IEEE, Piscataway, NJ), 351–365.Google Scholar
  • Fan W, Hong LJ, Nelson BL (2016) Indifference-zone-free selection of the best. Oper. Res. 64(6):1499–1514.LinkGoogle Scholar
  • Fan W, Hong LJ, Jiang G, Luo J (2024) Review of large-scale simulation optimization. Preprint, submitted March 23, https://arxiv.org/abs/2403.15669.Google Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Frazier PI, Powell WB, Dayanik S (2008) A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47(5):2410–2439.CrossrefGoogle Scholar
  • Fu MC, Henderson SG (2017) History of seeking better solutions, aka simulation optimization. Proc. Winter Simulation Conf. (IEEE, Piscataway, NJ), 131–157.Google Scholar
  • Gao S, Chen W (2015) A note on the subset selection for simulation optimization. Proc. Winter Simulation Conf. (IEEE, Piscataway, NJ), 3768–3776.Google Scholar
  • Garivier A, Kaufmann E (2016) Optimal best arm identification with fixed confidence. Proc. Conf. Learn. Theory (JMLR, Cambridge, MA), 998–1027.Google Scholar
  • Garivier A, Kaufmann E (2021) Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models. Sequential Anal. 40(1):61–96.CrossrefGoogle Scholar
  • Glynn P, Juneja S (2004) A large deviations perspective on ordinal optimization. Proc. Winter Simulation Conf. (ACM, New York).Google Scholar
  • Glynn P, Juneja S (2018) Selecting the best system and multi-armed bandits. Preprint, submitted July 16, https://arxiv.org/abs/1507.04564.Google Scholar
  • Graepel T, Candela JQ, Borchert T, Herbrich R (2010) Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. Proc. 27th Internat. Conf. Machine Learn. (Omnipress, Madison, WI).Google Scholar
  • Hong LJ, Fan W, Luo J (2021) Review on ranking and selection: A new perspective. Frontiers Engrg. Management 8(3):321–343.CrossrefGoogle Scholar
  • Hong LJ, Jiang G, Zhong Y (2022) Solving large-scale fixed-budget ranking and selection problems. INFORMS J. Comput. 34(6):2930–2949.LinkGoogle Scholar
  • Jennison C, Johnstone IM, Turnbull BW (1982) Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means. Statistical Decision Theory and Related Topics III (Elsevier, Amsterdam), 55–86.CrossrefGoogle Scholar
  • Jourdan M, Degenne R, Kaufmann E (2023a) Dealing with unknown variances in best-arm identification. Proc. Internat. Conf. Algorithmic Learn. Theory (JMLR, Cambridge, MA), 776–849.Google Scholar
  • Jourdan M, Degenne R, Kaufmann E (2023b) An ε-best-arm identification algorithm for fixed-confidence and beyond. Proc. 37th Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Jourdan M, Degenne R, Baudry D, de Heide R, Kaufmann E (2022) Top two algorithms revisited. Adv. Neural Inform. Processing Systems 35:26791–26803.Google Scholar
  • Kalyanakrishnan S, Tewari A, Auer P, Stone P (2012) PAC subset selection in stochastic multi-armed bandits. Proc. 29th Internat. Conf. Machine Learn., vol. 12 (Omnipress, Madison, WI), 655–662. Google Scholar
  • Kato M, Ariu K (2024) The role of contextual information in best arm identification. Preprint, submitted June 26, https://arxiv.org/abs/2106.14077.Google Scholar
  • Kaufmann E, Kalyanakrishnan S (2013) Information complexity in bandit subset selection. Proc. Conf. Learn. Theory (JMLR, Cambridge, MA), 228–251.Google Scholar
  • Kaufmann E, Koolen WM (2021) Mixture martingales revisited with applications to sequential tests and confidence intervals. J. Machine Learn. Res. 22(1):11140–11183.Google Scholar
  • Kaufmann E, Koolen WM, Garivier A (2018) Sequential test for the lowest mean: From Thompson to Murphy sampling. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31. (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Kim SH, Nelson BL (2001) A fully sequential procedure for indifference-zone selection in simulation. ACM Trans. Modeling Comput. Simulations 11(3):251–273.CrossrefGoogle Scholar
  • Komiyama J (2024) Suboptimal performance of the bayes optimal algorithm in frequentist best arm identification. Preprint, submitted February 10, https://arxiv.org/abs/2202.05193.Google Scholar
  • Lai TL, Liao OYW (2012) Efficient adaptive randomization and stopping rules in multi-arm clinical trials for testing a new treatment. Sequence Anal. 31(4):441–457.CrossrefGoogle Scholar
  • Li Z, Fan W, Hong LJ (2025) The (surprising) sample optimality of greedy procedures for large-scale ranking and selection. Management Sci. 71(2):1238–1259.LinkGoogle Scholar
  • Locatelli A, Gutzeit M, Carpentier A (2016) An optimal algorithm for the thresholding bandit problem. Proc. Internat. Conf. Machine Learn. (JMLR, Cambridge, MA), 1690–1698.Google Scholar
  • Mason B, Jain L, Tripathy A, Nowak R (2020) Finding all ϵ-good arms in stochastic bandits. Adv. Neural Inform. Processing Systems 33:20707–20718.Google Scholar
  • Ménard P (2019) Gradient ascent for active exploration in bandit problems. Preprint, submitted May 20, https://arxiv.org/abs/1905.08165.Google Scholar
  • Nelson BL, Matejcik FJ (1995) Using common random numbers for indifference-zone selection and multiple comparisons in simulation. Management Sci. 41(12):1935–1945.LinkGoogle Scholar
  • Perng SK (1969) A comparison of the asymptotic expected sample sizes of two sequential procedures for ranking problem. Ann. Math. Statist. 40(6):2198–2202.CrossrefGoogle Scholar
  • Pryzant R, Iter D, Li J, Lee YT, Zhu C, Zeng M (2023) Automatic prompt optimization with “gradient descent” and beam search. Proc. Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
  • Qiao G, Tewari A (2024) An asymptotically optimal algorithm for the convex hull membership problem. Preprint, submitted February 3, https://arxiv.org/abs/2302.02033.Google Scholar
  • Qin C, Klabjan D, Russo D (2017) Improving the expected improvement algorithm. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30. (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Russo D (2020) Simple Bayesian algorithms for best-arm identification. Oper. Res. 68(6):1625–1647.LinkGoogle Scholar
  • Russo D, Roy BV, Kazerouni A, Osband I, Wen Z (2018) A tutorial on Thompson sampling. Frontiers Machine Learn. 11(1):1–96.CrossrefGoogle Scholar
  • Shang X, Kaufmann E, Valko M (2019) A simple dynamic bandit algorithm for hyper-parameter tuning. Proc. 6th ICML Workshop Automated Machine Learn. (ICML, San Diego, CA).Google Scholar
  • Shang X, de Heide R, Ménard P, Kaufmann E, Valko M (2020) Fixed-confidence guarantees for Bayesian best-arm identification. Proc. Internat. Conf. Artificial Intelligence Statist. (JMLR, Cambridge, MA), 1823–1832.Google Scholar
  • Simchi-Levi D, Zheng Z, Zhu F (2024) A simple and optimal policy design with safety against heavy-tailed risk for stochastic bandits. Management Sci., ePub ahead of print October 30, https://doi.org/10.1287/mnsc.2022.03512.Google Scholar
  • Simchowitz M, Jamieson K, Recht B (2017) The Simulator: Understanding adaptive sampling in the moderate-confidence regime. Proc. Conf. Learn. Theory (JMLR, Cambridge, MA), 1794–1834.Google Scholar
  • Tirinzoni A, Degenne R (2022) On elimination strategies for bandit fixed-confidence identification. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol 35. (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Wang PA, Tzeng RC, Proutiere A (2021) Fast pure exploration via Frank-Wolfe. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34. (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Wang W, Wan H, Chen X (2024) Bonferroni-free and indifference-zone-flexible sequential elimination procedures for ranking and selection. Oper. Res. 72(5):2119–2134.LinkGoogle Scholar
  • Wu D, Zhou E (2018a) Analyzing and provably improving fixed budget ranking and selection algorithms. Preprint, submitted November 26, https://arxiv.org/abs/1811.12183.Google Scholar
  • Wu D, Zhou E (2018b) Provably improving the optimal computing budget allocation algorithm. Proc. Winter Simulation Conf. (IEEE, Piscataway, NJ), 1921–1932.Google Scholar
  • You W, Qin C, Wang Z, Yang S (2023) Information-directed selection for top-two algorithms. Proc. 36th Annual Conf. Learn. Theory (JMLR, Cambridge, MA), 2850–2851.Google Scholar
  • Zhang G, Peng Y, Zhang J, Zhou E (2023) Asymptotically optimal sampling policy for selecting top-m alternatives. INFORMS J. Comput. 35(6):1261–1285.LinkGoogle Scholar
  • Zhong Y, Hong LJ (2022) Knockout-tournament procedures for large-scale ranking and selection in parallel computing environments. Oper. Res. 70(1):432–453.LinkGoogle Scholar
  • Zhou X, Hao B, Lattimore T, Kang J, Li L (2024) Sequential best-arm identification with application to P300 speller. Trans. Machine Learn. Res. (MIT Press, Cambridge, MA).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.