Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

Daniel R. Jiang
Corresponding Author
Daniel R. Jiang
[email protected]
https://orcid.org/0000-0002-5388-8061
Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15213;
Search for more papers by this author
,
Lina Al-Kanj
Lina Al-Kanj
[email protected]
https://orcid.org/0000-0003-4429-4525
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544
Search for more papers by this author
,
Warren B. Powell
Corresponding Author
Warren B. Powell
[email protected]
https://orcid.org/0000-0002-4364-7555
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544
Search for more papers by this author

Daniel R. Jiang

Corresponding Author

Daniel R. Jiang

[email protected]

https://orcid.org/0000-0002-5388-8061

Department of Industrial Engineering, University of Pittsburgh, Pittsburgh, Pennsylvania 15213;

Search for more papers by this author

Lina Al-Kanj

[email protected]

https://orcid.org/0000-0003-4429-4525

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544

Search for more papers by this author

Warren B. Powell

Corresponding Author

Warren B. Powell

[email protected]

https://orcid.org/0000-0002-4364-7555

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544

Search for more papers by this author

Published Online:11 Sep 2020https://doi.org/10.1287/opre.2019.1939

References

Al-Kanj L, Powell WB, Bouzaiene-Ayari B (2016) The information-collecting vehicle routing problem: Stochastic optimization for emergency storm response. Preprint, submitted May 18, https://arxiv.org/abs/1605.05711.Google Scholar
Andersen L, Broadie M (2004) Primal-dual simulation algorithm for pricing multidimensional American options. Management Sci. 50(9):1222–1234.Link, Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite time analysis of the multiarmed bandit problem. Machine Learn. 47(2–3):235–256.Crossref, Google Scholar
Auger D, Couëtoux A, Teytaud O (2013) Continuous upper confidence trees with polynomial exploration—consistency. Proc. Joint Eur. Conf. Machine Learn. Knowledge Discovery Databases (Springer, Berlin, Heidelberg), 194–209.Google Scholar
Bertsimas D, Griffith JD, Gupta V, Kochenderfer MJ, Mišić VV (2017) A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems. Eur. J. Oper. Res. 263(2):664–678.Crossref, Google Scholar
Breiman L (1992) Probability (Society of Industrial and Applied Mathematics, Philadelphia, PA).Crossref, Google Scholar
Broadie M, Glasserman P (1997) Pricing American-style securities using simulation. J. Econom. Dynam. Control. 21(8–9):1323–1352.Crossref, Google Scholar
Brown DB, Smith JE (2011) Dynamic portfolio optimization with transaction costs: Heuristics and dual bounds. Management Sci. 57(10):1752–1770.Link, Google Scholar
Brown DB, Smith JE (2014) Information relaxations, duality, and convex stochastic dynamic programs. Oper. Res. 62(6):1394–1415.Link, Google Scholar
Brown DB, Smith JE, Sun P (2010) Information relaxations and duality in stochastic dynamic programs. Oper. Res. 58(4):785–801.Link, Google Scholar
Browne C, Powley E, Whitehouse D, Lucas S, Cowling PI, Rohlfshagen P, Tavener S, Perez D, Samothrakis S, Colton S (2012) A survey of Monte Carlo tree search methods. IEEE Trans. Intelligence AI Games. 4(1):1–49.Crossref, Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Crossref, Google Scholar
Cazenave T (2009) Nested Monte-Carlo search. Proc. 21st Internat. Joint Conf. Artificial Intelligence, Pasadena, CA, 456–461.Google Scholar
Chand S, Hsu V, Sethi S (2002) Forecast, solution, and rolling horizons in operations management problems: A classified bibliography. Manufacturing Service Oper. Management 4(1):25–43.Link, Google Scholar
Chang HS, Fu MC, Hu JQ, Marcus SI (2005) An adaptive sampling algorithm for solving Markov decision processes. Oper. Res. 53(1):126–139.Link, Google Scholar
Chaslot G, Saito JT, Uiterwijk JWHM, Bouzy B, van den Herik HJ (2006) Monte-Carlo strategies for computer go. Proc. 18th Belgian-Dutch Conf. Artificial Intelligence, 83–90.Google Scholar
Chaslot G, Bakkes S, Szita I, Spronck P (2008) Monte-Carlo tree search: A new framework for Game AI. Proc. 4th Artificial Intelligence Interactive Digital Entertainment Conf., 216–217.Google Scholar
Chen Y, Farias VF (2013) Simple policies for dynamic pricing with imperfect forecasts. Oper. Res. 61(3):612–624.Link, Google Scholar
Couëtoux A, Hoock JB, Sokolovska N, Teytaud O, Bonnard N (2011) Continuous upper confidence trees. Proc. Internat. Conf. Learn. Intelligent Optim., 433–445.Google Scholar
Coulom R (2007) Efficient selectivity and backup operators in Monte-Carlo tree search. Comput. Games 4630:72–83.Crossref, Google Scholar
Desai VV, Farias VF, Moallemi CC (2012) Pathwise optimization for optimal stopping problems. Management Sci. 58(12):2292–2308.Google Scholar
Gelly S, Silver D (2011) Monte-Carlo tree search and rapid action value estimation in computer Go. Artificial Intelligence 175(11):1856–1876.Crossref, Google Scholar
Gelly S, Kocsis L, Schoenauer M, Sebag M, Silver D, Szepesvári C, Teytaud O (2012) The grand challenge of computer Go: Monte Carlo tree search and extensions. Comm. ACM 55(3):106–113.Crossref, Google Scholar
Goodson JC, Thomas BW, Ohlmann JW (2016) Restocking-based rollout policies for the vehicle routing problem with stochastic demand and duration limits. Transportation Sci. 50(2):591–607.Link, Google Scholar
Haugh MB, Kogan L (2004) Pricing American options: A duality approach. Oper. Res. 52(2):258–270.Link, Google Scholar
Hingston P, Masek M (2007) Experiments with Monte Carlo Othello. IEEE Congress Evolutionary Comput., Singapore, 4059–4064.Google Scholar
Jasin S, Kumar S (2012) A re-solving heuristic with bounded revenue loss for network revenue management with customer choice. Math. Oper. Res. 37(2):313–345.Link, Google Scholar
Knuth DE, Moore RW (1975) An analysis of alpha-beta pruning. Artificial Intelligence 6(4):293–326.Crossref, Google Scholar
Kocsis L, Szepesvári C (2006) Bandit based Monte-Carlo planning. Proc. 17th Eur. Conf. Machine Learn. (Springer, Berlin), 282–293.Google Scholar
Kushner HJ, Yin GG (2003) Stochastic Approximation and Recursive Algorithms and Applications (Springer, New York).Google Scholar
Lai G, Wang MX, Kekre S, Scheller-Wolf A, Secomandi N (2011) Valuation of storage at a liquefied natural gas terminal. Oper. Res. 59(3):602–616.Link, Google Scholar
Land AH, Doig AG (1960) An automatic method of solving discrete programming problems. Econometrica 28(3):497–520.Crossref, Google Scholar
Maitrepierre R, Mary J, Munos R (2008) Adaptive play in Texas Hold’em Poker. Ghallab M, Spyropoulos CD, Fakotakis N, Avouris N, eds. Proc. 18th Eur. Conf. Artificial Intelligence (ECAI 2008) (IOS Press, Amsterdam), 458–462.Google Scholar
Méhat J, Cazenave T (2010) Combining UCT and nested Monte Carlo search for single-player general game playing. IEEE Trans. Comput. Intelligent AI Games 2(4):271–277.Crossref, Google Scholar
Nadarajah S, Margot F, Secomandi N (2015) Relaxations of approximate linear programs for the real option management of commodity storage. Management Sci. 61(12):3054–3076.Link, Google Scholar
Nascimento JM, Powell WB (2009) An optimal approximate dynamic programming algorithm for the lagged asset acquisition problem. Math. Oper. Res. 34(1):210–237.Link, Google Scholar
Nijssen JPAM (2007) Playing Othello using Monte Carlo. Working paper, Maastricht University, Maastricht, Netherlands.Google Scholar
Osaki Y, Shibahara K, Tajima Y, Kotani Y (2008) An Othello evaluation function based on temporal difference learning using probability of winning. IEEE Sympos. Comput. Intelligence Games, 205–211.Google Scholar
Ozkan E, Ward A (2017) Dynamic matching for real-time ridesharing. Working paper, Koç University, Istanbul, Turkey.Google Scholar
Ponsen M, Gerritsen G, Chaslot G (2010) Integrating opponent models with Monte-Carlo tree search in Poker. Interactive Decision Theory and Game Theory (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 37–42.Google Scholar
Robles D, Rohlfshagen P, Lucas SM (2011) Learning non-random moves for playing Othello: Improving Monte Carlo tree search. IEEE Conf. on Computational Intelligence and Games, Seoul, 305–312.Google Scholar
Rogers LC (2002) Monte Carlo valuation of American options. Math. Finance 12(3):271–286.Crossref, Google Scholar
Silver D, Veness J (2010) Monte-Carlo planning in large POMDPs. Adv. in Neural Inform. Processing Systems, vol 23 (Curran Associates, Red Hook, NY), 2164–2172.Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G, Schrittwieser J, et al.. (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529(7587):484–489.Crossref, Google Scholar
Van den Broeck G, Driessens K, Ramon J (2009) Monte-Carlo tree search in Poker using expected reward distributions. Adv. in Machine Learn. (Springer, Berlin, Heidelberg), 367–381.Crossref, Google Scholar
Van Lishout F, Chaslot G, Uiterwijk JWHM (2007) Monte-Carlo tree search in Backgammon. Working paper, University of Liège, Montefiore Institute, Liège, Belgium.Google Scholar

Volume 68, Issue 6

November-December 2020

Pages ii-iv, 1625-1931, C2

Article Information

Metrics

Information

Received:April 17, 2017
Accepted:August 09, 2019
Published Online:September 11, 2020

Cite as

Daniel R. Jiang, Lina Al-Kanj, Warren B. Powell (2020) Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds. Operations Research 68(6):1678-1697.

https://doi.org/10.1287/opre.2019.1939

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Optimistic Monte Carlo Tree Search with Sampled Information Relaxation Dual Bounds

References

Volume 68, Issue 6

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News