An Efficient Node Selection Policy for Monte Carlo Tree Search with Neural Networks
References
- (2010) Best arm identification in multi-armed bandits. Tauman Kalai A, Mohri M, eds. Proc. 23rd Annual Conf. Learn. Theory (COLT 2010), vol. 13 (OmniPress, Madison, WI), 41–53.Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47:235–256.Crossref, Google Scholar
- (1954) The theory of dynamic programming. Bull. Amer. Math. Soc. 60(6):503–515.Crossref, Google Scholar
- (2009) Pure exploration in multi-armed bandits problems. Gavaldà R, Lugosi G, Zeugmann T, Zilles S, eds. Algorithmic Learn. Theory ALT 2009, Lecture Notes in Computer Science, vol. 5809 (Springer, Berlin, Heidelberg), 23–37.Crossref, Google Scholar
- (2005) An adaptive sampling algorithm for solving Markov decision processes. Oper. Res. 53(1):126–139.Link, Google Scholar
- (2000) Simulation budget allocation for further enhancing the efficiency of ordinal optimization. Discrete Event Dynam. Systems 10:251–270.Crossref, Google Scholar
- (2004) A large deviations perspective on ordinal optimization. Ingalls RG, Rossetti MD, Smith JS, Peter BA, eds. Proc. 2004 Winter Simulation Conf. (WSC), vol. 1 (IEEE, Piscataway, NJ), 577–585.Google Scholar
- (2017) Monte-Carlo tree search by best arm identification. Adv. Neural Inform. Processing Systems 30:1–10.Google Scholar
- (2014) Adam: A method for stochastic optimization. Preprint, submitted December 22, https://arxiv.org/abs/1412.6980.Google Scholar
- (2006) Bandit based Monte-Carlo planning. Fürnkranz J, Scheffer T, Spiliopoulou M, eds. Machine Learn. ECML 2006, Lecture Notes in Computer Science, vol. 4212 (Springer, Berlin, Heidelberg), 282–293.Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge, MA).Crossref, Google Scholar
- (2021) An optimal computing budget allocation tree policy for Monte Carlo tree search. IEEE Trans. Automatic Control 67(6):2685–2699.Crossref, Google Scholar
- (2024) Efficient learning for clustering and optimizing context-dependent designs. Oper. Res. 72(2):617–638.Link, Google Scholar
- (2024a) Training deep Q-network via Monte Carlo tree search for adaptive bitrate control in video delivery. Preprint, submitted March 13, http://dx.doi.org/10.2139/ssrn.4757662.Google Scholar
- (2024b) An efficient node selection policy for Monte Carlo tree search with neural networks. http://dx.doi.org/10.1287/ijoc.2023.0307.cd, https://github.com/INFORMSJoC/2023.0307.Google Scholar
- (2021) Reinforcement learning in discrete action space applied to inverse defect design. J. Phys. Comm. 5(3):031001.Crossref, Google Scholar
- (2011) Sample-based planning for continuous action Markov decision processes. Proc. Internat. Conf. Automated Planning Scheduling, vol. 21 (Association for the Advancement of Artificial Intelligence, Washington, DC), 335–338.Google Scholar
- OpenAI Gym (2021) OpenAI gym cartpole-v0. Accessed September 29, 2021, https://github.com/openai/gym/wiki/CartPole-v0.Google Scholar
- (2022) Thompson sampling meets ranking and selection. Feng B, Pedrielli G, Peng Y, Shashaani S, Song E, Corlu CG, Lee LH, Chew EP, Roeder T, Lendermann P, eds. Proc. 2022 Winter Simulation Conf. (WSC) (IEEE, Piscataway, NJ), 3075–3086.Google Scholar
- (2018) Ranking and selection as stochastic control. IEEE Trans. Automatic Control 63(8):2359–2373.Crossref, Google Scholar
- (2020) Simple Bayesian algorithms for best-arm identification. Oper. Res. 68(6):1625–1647.Link, Google Scholar
- (2018) A Tutorial on Thompson Sampling, Foundations and Trends in Machine Learning, vol. 11 (now Publishers, Hanover, MA), 1–96.Crossref, Google Scholar
- (2020) Mastering Atari, Go, chess and shogi by planning with a learned model. Nature 588(7839):604–609.Crossref, Google Scholar
- (2018) Tractable sampling strategies for ordinal optimization. Oper. Res. 66(6):1693–1712.Link, Google Scholar
- (2017) Mastering the game of Go without human knowledge. Nature 550(7676):354–359.Crossref, Google Scholar
- (2014) Efficient sampling method for Monte Carlo tree search problem. IEICE Trans. Inform. Systems 97(3):392–398.Crossref, Google Scholar
- (2012) Bayesian inference in Monte-Carlo tree search. Preprint, submitted March 15, https://arxiv.org/abs/1203.3519.Google Scholar
- (2022) Planning for sample efficient imitation learning. Adv. Neural Inform. Processing Systems 35:2577–2589.Google Scholar
- (2020) Sequential sampling for a ranking and selection problem with exponential sampling distributions. Bae K-H, Feng B, Kim S, Lazarova-Molnar S, Zheng Z, Roeder T, Thiesing R, eds. Proc. 2020 Winter Simulation Conf. (WSC) (IEEE, Piscataway, NJ), 2984–2995.Google Scholar
- (2023a) A simulation optimization method for scheduling automated guided vehicles in a stochastic warehouse management system. Corlu CG, Hunter SR, Lam H, Onggo BS, Shortle J, Biller B, eds. Proc. 2023 Winter Simulation Conf. (WSC) (IEEE, Piscataway, NJ), 3238–3249.Google Scholar
- (2022) An efficient dynamic sampling policy for Monte Carlo tree search. Feng B, Pedrielli G, Peng Y, Shashaani S, Song E, Corlu CG, Lee LH, Chew EP, Roeder T, Lendermann P, eds. Proc. 2022 Winter Simulation Conf. (WSC) (IEEE, Piscataway, NJ) 2760–2771.Google Scholar
- (2024) Efficient learning for selecting top-m context-dependent designs. IEEE Trans. Automation Sci. Engrg., ePub ahead of print April 29, https://doi.org/10.1109/TASE.2024.3391020.Google Scholar
- (2023b) Efficient sampling policy for selecting a subset with the best. IEEE Trans. Automatic Control 68(8):4904–4911.Crossref, Google Scholar
- (2023c) Asymptotically optimal sampling policy for selecting top-m alternatives. INFORMS J. Comput. 35(6):1261–1285.Link, Google Scholar

