Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds
Published Online:11 Sep 2020https://doi.org/10.1287/opre.2019.1939
References
- (2016) The information-collecting vehicle routing problem: Stochastic optimization for emergency storm response. Preprint, submitted May 18, https://arxiv.org/abs/1605.05711.Google Scholar
- (2004) Primal-dual simulation algorithm for pricing multidimensional American options. Management Sci. 50(9):1222–1234.Link, Google Scholar
- (2002) Finite time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
- (2013) Continuous upper confidence trees with polynomial exploration—consistency. Proc. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases (Springer, Berlin, Heidelberg), 194–209.Google Scholar
- (2017) A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems. Eur. J. Oper. Res. 263(2):664–678.Crossref, Google Scholar
- (1992) Probability (Society of Industrial and Applied Mathematics, Philadelphia, PA).Crossref, Google Scholar
- (1997) Pricing American-style securities using simulation. J. Econom. Dynam. Control. 21(8–9):1323–1352.Crossref, Google Scholar
- (2011) Dynamic portfolio optimization with transaction costs: Heuristics and dual bounds. Management Sci. 57(10):1752–1770.Link, Google Scholar
- (2014) Information relaxations, duality, and convex stochastic dynamic programs. Oper. Res. 62(6):1394–1415.Link, Google Scholar
- (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4):785–801.Link, Google Scholar
- (2012) A survey of Monte Carlo tree search methods. IEEE Trans. Intelligence AI Games. 4(1):1–49.Crossref, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
- (2009) Nested Monte-Carlo search. Proc. 21st Internat. Joint Conf. Artificial Intelligence, Pasadena, CA, 456–461.Google Scholar
- (2002) Forecast, solution, and rolling horizons in operations management problems: A classified bibliography. Manufacturing Service Oper. Management 4(1):25–43.Link, Google Scholar
- (2005) An adaptive sampling algorithm for solving Markov decision processes. Oper. Res. 53(1):126–139.Link, Google Scholar
- (2006) Monte-Carlo strategies for computer go. Proc. 18th Belgian-Dutch Conf. Artificial Intelligence, 83–90.Google Scholar
- (2008) Monte-Carlo tree search: A new framework for Game AI. Proc. 4th Artificial Intelligence Interactive Digital Entertainment Conf., 216–217.Google Scholar
- (2013) Simple policies for dynamic pricing with imperfect forecasts. Oper. Res. 61(3):612–624.Link, Google Scholar
- (2011) Continuous upper confidence trees. Proc. Internat. Conf. Learn. Intelligent Optim., 433–445.Google Scholar
- (2007) Efficient selectivity and backup operators in Monte-Carlo tree search. Comput. Games 4630:72–83.Crossref, Google Scholar
- (2012) Pathwise optimization for optimal stopping problems. Management Sci. 58(12):2292–2308.Google Scholar
- (2011) Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence 175(11):1856–1876.Crossref, Google Scholar
- (2012) The grand challenge of computer Go: Monte Carlo tree search and extensions. Comm. ACM 55(3):106–113.Crossref, Google Scholar
- (2016) Restocking-based rollout policies for the vehicle routing problem with stochastic demand and duration limits. Transportation Sci. 50(2):591–607.Link, Google Scholar
- (2004) Pricing American options: A duality approach. Oper. Res. 52(2):258–270.Link, Google Scholar
- (2007) Experiments with Monte Carlo Othello. IEEE Congress Evolutionary Comput., Singapore, 4059–4064.Google Scholar
- (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.Link, Google Scholar
- (1975) An analysis of alpha-beta pruning. Artificial Intelligence 6(4):293–326.Crossref, Google Scholar
- (2006) Bandit based Monte-Carlo planning. Proc. 17th Eur. Conf. Machine Learn. (Springer, Berlin), 282–293.Google Scholar
- (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer, New York).Google Scholar
- (2011) Valuation of storage at a liquefied natural gas terminal. Oper. Res. 59(3):602–616.Link, Google Scholar
- (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520.Crossref, Google Scholar
- (2008) Adaptive play in Texas Hold’em Poker. Ghallab M, Spyropoulos CD, Fakotakis N, Avouris N, eds. Proc. 18th Eur. Conf. Artificial Intelligence (ECAI 2008) (IOS Press, Amsterdam), 458–462.Google Scholar
- (2010) Combining UCT and nested Monte Carlo search for single-player general game playing. IEEE Trans. Comput. Intelligent AI Games 2(4):271–277.Crossref, Google Scholar
- (2015) Relaxations of approximate linear programs for the real option management of commodity storage. Management Sci. 61(12):3054–3076.Link, Google Scholar
- (2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Math. Oper. Res. 34(1):210–237.Link, Google Scholar
- (2007) Playing Othello using Monte Carlo. Working paper, Maastricht University, Maastricht, Netherlands.Google Scholar
- (2008) An Othello evaluation function based on temporal difference learning using probability of winning. IEEE Sympos. Comput. Intelligence Games, 205–211.Google Scholar
- (2017) Dynamic matching for real-time ridesharing. Working paper, Koç University, Istanbul, Turkey.Google Scholar
- (2010) Integrating opponent models with Monte-Carlo tree search in Poker. Interactive Decision Theory and Game Theory (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 37–42.Google Scholar
- (2011) Learning non-random moves for playing Othello: Improving Monte Carlo tree search. IEEE Conf. on Computational Intelligence and Games, Seoul, 305–312.Google Scholar
- (2002) Monte Carlo valuation of American options. Math. Finance 12(3):271–286.Crossref, Google Scholar
- (2010) Monte-Carlo planning in large POMDPs. Adv. in Neural Inform. Processing Systems, vol 23 (Curran Associates, Red Hook, NY), 2164–2172.Google Scholar
- . (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.Crossref, Google Scholar
- (2009) Monte-Carlo tree search in Poker using expected reward distributions. Adv. in Machine Learn. (Springer, Berlin, Heidelberg), 367–381.Crossref, Google Scholar
- (2007) Monte-Carlo tree search in Backgammon. Working paper, University of Liège, Montefiore Institute, Liège, Belgium.Google Scholar

