Dual-Directed Algorithm Design for Efficient Pure Exploration
References
- (2020) Optimal δ-correct best-arm selection for heavy-tailed distributions. Kontorovich A, Neu G, eds. Proc. 31st Internat. Conf. Algorithmic Learn. Theory, vol. 117 (PMLR, Cambridge, MA), 61–110.Google Scholar
- (2022) On the complexity of all ε-best arms identification. Amini M-R, Canu S, Fischer A, Guns T, Novak PK, Tsoumakas G, eds. Proc. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases (Springer, Switzerland), 317–332.Google Scholar
- (1969) Calculus, vol. 2, 2nd ed. (John Wiley & Sons, New York).Google Scholar
- (2023) Using cache or credit for parallel ranking and selection. ACM Trans. Modeling Comput. Simulations 33(4):1–28.Crossref, Google Scholar
- (2024) Optimal top two method for best arm identification and fluid analysis. Globerson A, Mackey L, Belgrave D, Fan A, Paquet U, Tomczak J, Zhang C, eds. Proc. 38th Ann. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY).Google Scholar
- (1954) A single-sample multiple decision procedure for ranking means of normal populations with known variances. Ann. Math. Statist. 25(1):16–39.Crossref, Google Scholar
- (2013) Multiple identifications in multi-armed bandits. Proc. Internat. Conf. Machine Learn. (JMLR, Cambridge, MA), 258–265.Google Scholar
- (2011) An empirical evaluation of Thompson sampling. Proc. Adv. Neural Inform. Processing Systems, vol. 24 (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2023) Balancing optimal large deviations in sequential selection. Management Sci. 69(6):3457–3473.Link, Google Scholar
- (2008) Efficient simulation budget allocation for selecting an optimal subset. INFORMS J. Comput. 20(4):579–595.Link, Google Scholar
- (2000) Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dynamic Systems 10:251–270.Crossref, Google Scholar
- (1952) Sequential design of experiments. Ann. Math. Statist. 30(3):755–770.Crossref, Google Scholar
- (2006) Elements of Information Theory (John Wiley & Sons, Inc., Hoboken, NJ).Google Scholar
- (2019) Pure exploration with multiple correct answers. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY), 14591–14600.Google Scholar
- (2018) Guarantees on the probability of good selection. Proc. Winter Simulation Conf. (IEEE, Piscataway, NJ), 351–365.Google Scholar
- (2016) Indifference-zone-free selection of the best. Oper. Res. 64(6):1499–1514.Link, Google Scholar
- (2024) Review of large-scale simulation optimization. Preprint, submitted March 23, https://arxiv.org/abs/2403.15669.Google Scholar
- (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2008) A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47(5):2410–2439.Crossref, Google Scholar
- (2017) History of seeking better solutions, aka simulation optimization. Proc. Winter Simulation Conf. (IEEE, Piscataway, NJ), 131–157.Google Scholar
- (2015) A note on the subset selection for simulation optimization. Proc. Winter Simulation Conf. (IEEE, Piscataway, NJ), 3768–3776.Google Scholar
- (2016) Optimal best arm identification with fixed confidence. Proc. Conf. Learn. Theory (JMLR, Cambridge, MA), 998–1027.Google Scholar
- (2021) Nonasymptotic sequential tests for overlapping hypotheses applied to near-optimal arm identification in bandit models. Sequential Anal. 40(1):61–96.Crossref, Google Scholar
- (2004) A large deviations perspective on ordinal optimization. Proc. Winter Simulation Conf. (ACM, New York).Google Scholar
- (2018) Selecting the best system and multi-armed bandits. Preprint, submitted July 16, https://arxiv.org/abs/1507.04564.Google Scholar
- (2010) Web-scale Bayesian click-through rate prediction for sponsored search advertising in Microsoft’s Bing search engine. Proc. 27th Internat. Conf. Machine Learn. (Omnipress, Madison, WI).Google Scholar
- (2021) Review on ranking and selection: A new perspective. Frontiers Engrg. Management 8(3):321–343.Crossref, Google Scholar
- (2022) Solving large-scale fixed-budget ranking and selection problems. INFORMS J. Comput. 34(6):2930–2949.Link, Google Scholar
- (1982) Asymptotically optimal procedures for sequential adaptive selection of the best of several normal means. Statistical Decision Theory and Related Topics III (Elsevier, Amsterdam), 55–86.Crossref, Google Scholar
- (2023a) Dealing with unknown variances in best-arm identification. Proc. Internat. Conf. Algorithmic Learn. Theory (JMLR, Cambridge, MA), 776–849.Google Scholar
- (2023b) An ε-best-arm identification algorithm for fixed-confidence and beyond. Proc. 37th Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2022) Top two algorithms revisited. Adv. Neural Inform. Processing Systems 35:26791–26803.Google Scholar
- (2012) PAC subset selection in stochastic multi-armed bandits. Proc. 29th Internat. Conf. Machine Learn., vol. 12 (Omnipress, Madison, WI), 655–662. Google Scholar
- (2024) The role of contextual information in best arm identification. Preprint, submitted June 26, https://arxiv.org/abs/2106.14077.Google Scholar
- (2013) Information complexity in bandit subset selection. Proc. Conf. Learn. Theory (JMLR, Cambridge, MA), 228–251.Google Scholar
- (2021) Mixture martingales revisited with applications to sequential tests and confidence intervals. J. Machine Learn. Res. 22(1):11140–11183.Google Scholar
- (2018) Sequential test for the lowest mean: From Thompson to Murphy sampling. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 31. (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2001) A fully sequential procedure for indifference-zone selection in simulation. ACM Trans. Modeling Comput. Simulations 11(3):251–273.Crossref, Google Scholar
- (2024) Suboptimal performance of the bayes optimal algorithm in frequentist best arm identification. Preprint, submitted February 10, https://arxiv.org/abs/2202.05193.Google Scholar
- (2012) Efficient adaptive randomization and stopping rules in multi-arm clinical trials for testing a new treatment. Sequence Anal. 31(4):441–457.Crossref, Google Scholar
- (2025) The (surprising) sample optimality of greedy procedures for large-scale ranking and selection. Management Sci. 71(2):1238–1259.Link, Google Scholar
- (2016) An optimal algorithm for the thresholding bandit problem. Proc. Internat. Conf. Machine Learn. (JMLR, Cambridge, MA), 1690–1698.Google Scholar
- (2020) Finding all ϵ-good arms in stochastic bandits. Adv. Neural Inform. Processing Systems 33:20707–20718.Google Scholar
- (2019) Gradient ascent for active exploration in bandit problems. Preprint, submitted May 20, https://arxiv.org/abs/1905.08165.Google Scholar
- (1995) Using common random numbers for indifference-zone selection and multiple comparisons in simulation. Management Sci. 41(12):1935–1945.Link, Google Scholar
- (1969) A comparison of the asymptotic expected sample sizes of two sequential procedures for ranking problem. Ann. Math. Statist. 40(6):2198–2202.Crossref, Google Scholar
- (2023) Automatic prompt optimization with “gradient descent” and beam search. Proc. Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
- (2024) An asymptotically optimal algorithm for the convex hull membership problem. Preprint, submitted February 3, https://arxiv.org/abs/2302.02033.Google Scholar
- (2017) Improving the expected improvement algorithm. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 30. (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2020) Simple Bayesian algorithms for best-arm identification. Oper. Res. 68(6):1625–1647.Link, Google Scholar
- (2018) A tutorial on Thompson sampling. Frontiers Machine Learn. 11(1):1–96.Crossref, Google Scholar
- (2019) A simple dynamic bandit algorithm for hyper-parameter tuning. Proc. 6th ICML Workshop Automated Machine Learn. (ICML, San Diego, CA).Google Scholar
- (2020) Fixed-confidence guarantees for Bayesian best-arm identification. Proc. Internat. Conf. Artificial Intelligence Statist. (JMLR, Cambridge, MA), 1823–1832.Google Scholar
- (2024) A simple and optimal policy design with safety against heavy-tailed risk for stochastic bandits. Management Sci., ePub ahead of print October 30, https://doi.org/10.1287/mnsc.2022.03512.Google Scholar
- (2017) The Simulator: Understanding adaptive sampling in the moderate-confidence regime. Proc. Conf. Learn. Theory (JMLR, Cambridge, MA), 1794–1834.Google Scholar
- (2022) On elimination strategies for bandit fixed-confidence identification. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol 35. (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2021) Fast pure exploration via Frank-Wolfe. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34. (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2024) Bonferroni-free and indifference-zone-flexible sequential elimination procedures for ranking and selection. Oper. Res. 72(5):2119–2134.Link, Google Scholar
- (2018a) Analyzing and provably improving fixed budget ranking and selection algorithms. Preprint, submitted November 26, https://arxiv.org/abs/1811.12183.Google Scholar
- (2018b) Provably improving the optimal computing budget allocation algorithm. Proc. Winter Simulation Conf. (IEEE, Piscataway, NJ), 1921–1932.Google Scholar
- (2023) Information-directed selection for top-two algorithms. Proc. 36th Annual Conf. Learn. Theory (JMLR, Cambridge, MA), 2850–2851.Google Scholar
- (2023) Asymptotically optimal sampling policy for selecting top-m alternatives. INFORMS J. Comput. 35(6):1261–1285.Link, Google Scholar
- (2022) Knockout-tournament procedures for large-scale ranking and selection in parallel computing environments. Oper. Res. 70(1):432–453.Link, Google Scholar
- (2024) Sequential best-arm identification with application to P300 speller. Trans. Machine Learn. Res. (MIT Press, Cambridge, MA).Google Scholar

