Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Published Online:https://doi.org/10.1287/opre.2019.1939

References

  • Al-Kanj L, Powell WB, Bouzaiene-Ayari B (2016) The information-collecting vehicle routing problem: Stochastic optimization for emergency storm response. Preprint, submitted May 18, https://arxiv.org/abs/1605.05711.Google Scholar
  • Andersen L, Broadie M (2004) Primal-dual simulation algorithm for pricing multidimensional American options. Management Sci. 50(9):1222–1234.LinkGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.CrossrefGoogle Scholar
  • Auger D, Couëtoux A, Teytaud O (2013) Continuous upper confidence trees with polynomial exploration—consistency. Proc. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases (Springer, Berlin, Heidelberg), 194–209.Google Scholar
  • Bertsimas D, Griffith JD, Gupta V, Kochenderfer MJ, Mišić VV (2017) A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems. Eur. J. Oper. Res. 263(2):664–678.CrossrefGoogle Scholar
  • Breiman L (1992) Probability (Society of Industrial and Applied Mathematics, Philadelphia, PA).CrossrefGoogle Scholar
  • Broadie M, Glasserman P (1997) Pricing American-style securities using simulation. J. Econom. Dynam. Control. 21(8–9):1323–1352.CrossrefGoogle Scholar
  • Brown DB, Smith JE (2011) Dynamic portfolio optimization with transaction costs: Heuristics and dual bounds. Management Sci. 57(10):1752–1770.LinkGoogle Scholar
  • Brown DB, Smith JE (2014) Information relaxations, duality, and convex stochastic dynamic programs. Oper. Res. 62(6):1394–1415.LinkGoogle Scholar
  • Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4):785–801.LinkGoogle Scholar
  • Browne C, Powley E, Whitehouse D, Lucas S, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans. Intelligence AI Games. 4(1):1–49.CrossrefGoogle Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.CrossrefGoogle Scholar
  • Cazenave T (2009) Nested Monte-Carlo search. Proc. 21st Internat. Joint Conf. Artificial Intelligence, Pasadena, CA, 456–461.Google Scholar
  • Chand S, Hsu V, Sethi S (2002) Forecast, solution, and rolling horizons in operations management problems: A classified bibliography. Manufacturing Service Oper. Management 4(1):25–43.LinkGoogle Scholar
  • Chang HS, Fu MC, Hu JQ, Marcus SI (2005) An adaptive sampling algorithm for solving Markov decision processes. Oper. Res. 53(1):126–139.LinkGoogle Scholar
  • Chaslot G, Saito JT, Uiterwijk JWHM, Bouzy B, van den Herik HJ (2006) Monte-Carlo strategies for computer go. Proc. 18th Belgian-Dutch Conf. Artificial Intelligence, 83–90.Google Scholar
  • Chaslot G, Bakkes S, Szita I, Spronck P (2008) Monte-Carlo tree search: A new framework for Game AI. Proc. 4th Artificial Intelligence Interactive Digital Entertainment Conf., 216–217.Google Scholar
  • Chen Y, Farias VF (2013) Simple policies for dynamic pricing with imperfect forecasts. Oper. Res. 61(3):612–624.LinkGoogle Scholar
  • Couëtoux A, Hoock JB, Sokolovska N, Teytaud O, Bonnard N (2011) Continuous upper confidence trees. Proc. Internat. Conf. Learn. Intelligent Optim., 433–445.Google Scholar
  • Coulom R (2007) Efficient selectivity and backup operators in Monte-Carlo tree search. Comput. Games 4630:72–83.CrossrefGoogle Scholar
  • Desai VV, Farias VF, Moallemi CC (2012) Pathwise optimization for optimal stopping problems. Management Sci. 58(12):2292–2308.Google Scholar
  • Gelly S, Silver D (2011) Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence 175(11):1856–1876.CrossrefGoogle Scholar
  • Gelly S, Kocsis L, Schoenauer M, Sebag M, Silver D, Szepesvári C, Teytaud O (2012) The grand challenge of computer Go: Monte Carlo tree search and extensions. Comm. ACM 55(3):106–113.CrossrefGoogle Scholar
  • Goodson JC, Thomas BW, Ohlmann JW (2016) Restocking-based rollout policies for the vehicle routing problem with stochastic demand and duration limits. Transportation Sci. 50(2):591–607.LinkGoogle Scholar
  • Haugh MB, Kogan L (2004) Pricing American options: A duality approach. Oper. Res. 52(2):258–270.LinkGoogle Scholar
  • Hingston P, Masek M (2007) Experiments with Monte Carlo Othello. IEEE Congress Evolutionary Comput., Singapore, 4059–4064.Google Scholar
  • Jasin S, Kumar S (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.LinkGoogle Scholar
  • Knuth DE, Moore RW (1975) An analysis of alpha-beta pruning. Artificial Intelligence 6(4):293–326.CrossrefGoogle Scholar
  • Kocsis L, Szepesvári C (2006) Bandit based Monte-Carlo planning. Proc. 17th Eur. Conf. Machine Learn. (Springer, Berlin), 282–293.Google Scholar
  • Kushner HJ, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer, New York).Google Scholar
  • Lai G, Wang MX, Kekre S, Scheller-Wolf A, Secomandi N (2011) Valuation of storage at a liquefied natural gas terminal. Oper. Res. 59(3):602–616.LinkGoogle Scholar
  • Land AH, Doig AG (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520.CrossrefGoogle Scholar
  • Maitrepierre R, Mary J, Munos R (2008) Adaptive play in Texas Hold’em Poker. Ghallab M, Spyropoulos CD, Fakotakis N, Avouris N, eds. Proc. 18th Eur. Conf. Artificial Intelligence (ECAI 2008) (IOS Press, Amsterdam), 458–462.Google Scholar
  • Méhat J, Cazenave T (2010) Combining UCT and nested Monte Carlo search for single-player general game playing. IEEE Trans. Comput. Intelligent AI Games 2(4):271–277.CrossrefGoogle Scholar
  • Nadarajah S, Margot F, Secomandi N (2015) Relaxations of approximate linear programs for the real option management of commodity storage. Management Sci. 61(12):3054–3076.LinkGoogle Scholar
  • Nascimento JM, Powell WB (2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Math. Oper. Res. 34(1):210–237.LinkGoogle Scholar
  • Nijssen JPAM (2007) Playing Othello using Monte Carlo. Working paper, Maastricht University, Maastricht, Netherlands.Google Scholar
  • Osaki Y, Shibahara K, Tajima Y, Kotani Y (2008) An Othello evaluation function based on temporal difference learning using probability of winning. IEEE Sympos. Comput. Intelligence Games, 205–211.Google Scholar
  • Ozkan E, Ward A (2017) Dynamic matching for real-time ridesharing. Working paper, Koç University, Istanbul, Turkey.Google Scholar
  • Ponsen M, Gerritsen G, Chaslot G (2010) Integrating opponent models with Monte-Carlo tree search in Poker. Interactive Decision Theory and Game Theory (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 37–42.Google Scholar
  • Robles D, Rohlfshagen P, Lucas SM (2011) Learning non-random moves for playing Othello: Improving Monte Carlo tree search. IEEE Conf. on Computational Intelligence and Games, Seoul, 305–312.Google Scholar
  • Rogers LC (2002) Monte Carlo valuation of American options. Math. Finance 12(3):271–286.CrossrefGoogle Scholar
  • Silver D, Veness J (2010) Monte-Carlo planning in large POMDPs. Adv. in Neural Inform. Processing Systems, vol 23 (Curran Associates, Red Hook, NY), 2164–2172.Google Scholar
  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, et al.. (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.CrossrefGoogle Scholar
  • Van den Broeck G, Driessens K, Ramon J (2009) Monte-Carlo tree search in Poker using expected reward distributions. Adv. in Machine Learn. (Springer, Berlin, Heidelberg), 367–381.CrossrefGoogle Scholar
  • Van Lishout F, Chaslot G, Uiterwijk JWHM (2007) Monte-Carlo tree search in Backgammon. Working paper, University of Liège, Montefiore Institute, Liège, Belgium.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.