The (Surprising) Sample Optimality of Greedy Procedures for Large-Scale Ranking and Selection

Zaile Li
Zaile Li
[email protected]
https://orcid.org/0009-0000-8984-539X
Department of Management Science, School of Management, Fudan University, Shanghai 200433, China
Search for more papers by this author
,
Weiwei Fan
Corresponding Author
Weiwei Fan
[email protected]
https://orcid.org/0000-0002-8550-4165
Advanced Institute of Business, School of Economics and Management, Tongji University, Shanghai 200092, China
Search for more papers by this author
,
L. Jeff Hong
L. Jeff Hong
[email protected]
https://orcid.org/0000-0001-7011-4001
Department of Management Science, School of Management, Fudan University, Shanghai 200433, China; and School of Data Science, Fudan University, Shanghai 200433, China
Search for more papers by this author

Department of Management Science, School of Management, Fudan University, Shanghai 200433, China

Search for more papers by this author

Weiwei Fan

Corresponding Author

Weiwei Fan

[email protected]

https://orcid.org/0000-0002-8550-4165

Advanced Institute of Business, School of Economics and Management, Tongji University, Shanghai 200092, China

Search for more papers by this author

L. Jeff Hong

[email protected]

https://orcid.org/0000-0001-7011-4001

Department of Management Science, School of Management, Fudan University, Shanghai 200433, China; and School of Data Science, Fudan University, Shanghai 200433, China

Search for more papers by this author

Published Online:7 May 2024https://doi.org/10.1287/mnsc.2023.00694

References

Audibert JY, Bubeck S (2010) Best arm identification in multi-armed bandits. Kalai AT, Mohri M, eds. 23rd Conf. Learn. Theory (COLT) (Omnipress, Madison, WI), 41–53.Google Scholar
Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar
Bayati M, Hamidi N, Johari R, Khosravi K (2020) Unreasonable effectiveness of greedy algorithms in multi-armed bandit with many arms. Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T, eds. Adv. Neural Inform. Processing Systems (NeurIPS 2020), vol. 33 (Curran Associates, Inc., Red Hook, NY), 1713–1723.Google Scholar
Bechhofer RE (1954) A single-sample multiple decision procedure for ranking means of normal populations with known variances. Ann. Math. Statist. 25(1):16–39.Crossref, Google Scholar
Branke J, Chick SE, Schmidt C (2007) Selecting a selection procedure. Management Sci. 53(12):1916–1932.Link, Google Scholar
Chen Y, Ryzhov IO (2023) Balancing optimal large deviations in sequential selection. Management Sci. 69(6):3457–3473.Link, Google Scholar
Chen CH, Chick SE, Lee LH, Pujowidianto NA (2015) Ranking and selection: Efficient simulation budget allocation. Fu MC, ed. Handbook of Simulation Optimization (Springer, New York), 45–80.Crossref, Google Scholar
Chen CH, Lin J, Yücesan E, Chick SE (2000) Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dyn. Syst. 10(3):251–270.Crossref, Google Scholar
Chia YL, Glynn PW (2013) Limit theorems for simulation-based optimization via random search. ACM Trans. Model. Comput. Simul. 23(3):1–18.Crossref, Google Scholar
Chick SE, Frazier P (2012) Sequential sampling with economics of selection procedures. Management Sci. 58(3):550–569.Link, Google Scholar
Chick SE, Inoue K (2001a) New procedures to select the best simulated system using common random numbers. Management Sci. 47(8):1133–1149.Link, Google Scholar
Chick SE, Inoue K (2001b) New two-stage and sequential procedures for selecting the best simulated system. Oper. Res. 49(5):732–743.Link, Google Scholar
Eckman DJ, Henderson SG (2022) Posterior-based stopping rules for Bayesian ranking-and-selection procedures. INFORMS J. Comput. 34(3):1711–1728.Link, Google Scholar
Even-Dar E, Mannor S, Mansour Y, Mahadevan S (2006) Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems. J. Mach. Learn. Res. 7(39):1079–1105.Google Scholar
Fan W, Hong LJ, Nelson BL (2016) Indifference-zone-free selection of the best. Oper. Res. 64(6):1499–1514.Link, Google Scholar
Frazier PI (2014) A fully sequential elimination procedure for indifference-zone ranking and selection with tight bounds on probability of correct selection. Oper. Res. 62(4):926–942.Link, Google Scholar
Frazier PI, Powell WB, Dayanik S (2008) A knowledge-gradient policy for sequential information collection. SIAM J. Control Optim. 47(5):2410–2439.Crossref, Google Scholar
Gao S, Chen W, Shi L (2017) A new budget allocation framework for the expected opportunity cost. Oper. Res. 65(3):787–803.Link, Google Scholar
Glynn P, Juneja S (2004) A large deviations perspective on ordinal optimization. Ingalls RG, Rossetti MD, Smith JS, Peters BA, eds. Proc. 36th Winter Simulation Conf. (WSC), vol. 1 (IEEE, Piscataway, NJ), 577–585.Google Scholar
Hartmann M (1991) An improvement on Paulson’s procedure for selecting the poprlation with the largest mean from k normal populations with a common unknown variance. Sequential Anal. 10(1–2):1–16.Crossref, Google Scholar
Hassidim A, Kupfer R, Singer Y (2020) An optimal elimination algorithm for learning a best arm. Larochelle H, Ranzato M, Hadsell R, Balcan M-F, Lin H-T, eds. Adv. Neural Inform. Processing Systems (NeurIPS 2020), vol. 33 (Curran Associates, Inc., Red Hook, NY), 10788–10798.Google Scholar
Hong LJ (2006) Fully sequential indifference-zone selection procedures with variance-dependent sampling. Naval Res. Logist. 53(5):464–476.Crossref, Google Scholar
Hong LJ, Fan W, Luo J (2021) Review on ranking and selection: A new perspective. Front. Engrg. Management 8(3):321–343.Crossref, Google Scholar
Hong LJ, Jiang G, Zhong Y (2022) Solving large-scale fixed-budget ranking and selection problems. INFORMS J. Comput. 34(6):2930–2949.Link, Google Scholar
Hunter SR, Nelson BL (2017) Parallel ranking and selection. Tolk A, Fowler J, Shao G, Yücesan E, eds. Advances in Modeling and Simulation (Springer, Cham, Switzerland), 249–275.Crossref, Google Scholar
Jamieson K, Malloy M, Nowak R, Bubeck S (2013) On finding the largest mean among many. Preprint, submitted June 17, https://arxiv.org/abs/1306.3917.Google Scholar
Jedor M, Louëdec J, Perchet V (2021) Be greedy in multi-armed bandits. Preprint, submitted January 4, https://arxiv.org/abs/2101.01086.Google Scholar
Kamiński B, Szufel P (2018) On parallel policies for ranking and selection problems. J. Appl. Statist. 45(9):1690–1713.Crossref, Google Scholar
Kannan S, Morgenstern JH, Roth A, Waggoner B, Wu ZS (2018) A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Adv. Neural Inform. Processing Systems (NeurIPS 2018), vol. 31 (Curran Associates, Inc., Red Hook, NY), 2231–2241.Google Scholar
Karnin Z, Koren T, Somekh O (2013) Almost optimal exploration in multi-armed bandits. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., vol. 28 (PMLR, New York), 1238–1246.Google Scholar
Kim SH, Nelson BL (2001) A fully sequential procedure for indifference-zone selection in simulation. ACM Trans. Model. Comput. Simul. 11(3):251–273.Crossref, Google Scholar
Kim SH, Nelson BL (2006) Selecting the best system. Henderson SG, Nelson BL, eds. Simulation. Handbooks in Operations Research and Management Science, vol. 13 (Elsevier, Amsterdam), 501–534.Crossref, Google Scholar
Luo J, Hong LJ, Nelson BL, Wu Y (2015) Fully sequential procedures for large-scale ranking-and-selection problems in parallel computing environments. Oper. Res. 63(5):1177–1194.Link, Google Scholar
Nelson BL, Matejcik FJ (1995) Using common random numbers for indifference-zone selection and multiple comparisons in simulation. Management Sci. 41(12):1935–1945.Link, Google Scholar
Ni EC, Ciocan DF, Henderson SG, Hunter SR (2017) Efficient ranking and selection in parallel computing environments. Oper. Res. 65(3):821–836.Link, Google Scholar
Paulson E (1964) A sequential procedure for selecting the population with the largest mean from k normal populations. Ann. Math. Statist. 35(1):174–180.Crossref, Google Scholar
Pei L, Nelson BL, Hunter SR (2022) Parallel adaptive survivor selection. Oper. Res. 72(1):336–354.Link, Google Scholar
Peng Y, Chen CH, Fu MC, Hu JQ (2018) Gradient-based myopic allocation policy: An efficient sampling procedure in a low-confidence scenario. IEEE Trans. Automat. Control 63(9):3091–3097.Crossref, Google Scholar
Pichitlamken J, Nelson BL, Hong LJ (2006) A sequential procedure for neighborhood selection-of-the-best in optimization via simulation. Eur. J. Oper. Res. 173(1):283–298.Crossref, Google Scholar
Rinott Y (1978) On two-stage selection procedures and related probability-inequalities. Comm. Statist. Theory Methods 7(8):799–811.Crossref, Google Scholar
Salemi PL, Song E, Nelson BL, Staum J (2019) Gaussian Markov random fields for discrete optimization via simulation: Framework and algorithms. Oper. Res. 67(1):250–266.Link, Google Scholar
Semelhago M, Nelson BL, Song E, Wächter A (2021) Rapid discrete optimization via simulation with Gaussian Markov random fields. INFORMS J. Comput. 33(3):915–930.Link, Google Scholar
Siegmund D (1985) Sequential Analysis: Tests and Confidence Intervals (Springer Science & Business Media, New York).Crossref, Google Scholar
Wang W, Wan H, Chen X (2023) Bonferroni-free and indifference-zone-flexible sequential elimination procedures for ranking and selection. Oper. Res., ePub ahead of print April 11, https://doi.org/10.1287/opre.2023.2447.Link, Google Scholar
Wu D, Zhou E (2018a) Analyzing and provably improving fixed budget ranking and selection algorithms. Preprint, submitted November 26, https://arxiv.org/abs/1811.12183.Google Scholar
Wu D, Zhou E (2018b) Provably improving the optimal computing budget allocation algorithm. Proc. 2018 Winter Simulation Conf. (IEEE, Piscataway, NJ), 1921–1932.Google Scholar
Wu R, Liu S, Shi Z (2020) Algorithm for calculating the initial sample size in a fully sequential ranking and selection procedure. Asia Pac. J. Oper. Res. 37(03):2050015.Crossref, Google Scholar
Yakowitz S, L’Ecuyer P, Vazquez-Abad F (2000) Global stochastic optimization with low-dispersion point sets. Oper. Res. 48(6):939–950.Link, Google Scholar
Zhao Y, Stephens C, Szepesvari C, Jun KS (2023) Revisiting simple regret: Fast rates for returning a good arm. Andreas K, Emma B, Kyunghyun C Barbara E, Sivan S, Jonathan S, eds. Proc. 40th Internat. Conf. Machine Learn., vol. 202 (PMLR, New York), 42110–42158.Google Scholar
Zhong Y, Hong LJ (2022) Knockout-tournament procedures for large-scale ranking and selection in parallel computing environments. Oper. Res. 70(1):432–453.Link, Google Scholar
Zhong Y, Liu S, Luo J, Hong LJ (2022) Speeding up Paulson’s procedure for large-scale problems using parallel computing. INFORMS J. Comput. 34(1):586–606.Link, Google Scholar

Volume 71, Issue 2

February 2025

Pages iv-vi, 955

Article Information

Supplemental Material

Metrics

Information

Received:March 04, 2023
Accepted:December 11, 2023
Published Online:May 07, 2024

Cite as

Zaile Li; , Weiwei Fan; , L. Jeff Hong (2024) The (Surprising) Sample Optimality of Greedy Procedures for Large-Scale Ranking and Selection. Management Science 71(2):1238-1259.

https://doi.org/10.1287/mnsc.2023.00694

Keywords

Acknowledgments

The authors thank the department editor, associate editor, and anonymous reviewers for their time, efforts, and detailed comments, which helped greatly improve the manuscript.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The (Surprising) Sample Optimality of Greedy Procedures for Large-Scale Ranking and Selection

References

Volume 71, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News