Free Access

Markov Decision Processes, AlphaGo, and Monte Carlo Tree Search: Back to the Future

Published Online:3 Oct 2017https://doi.org/10.1287/educ.2017.0166

Abstract

In 2016, a computer Go-playing program called AlphaGo stunned the (human) world by winning a match (4 games to 1) against the reigning human world champion, a feat more impressive than previous victories by computer programs in chess (Deep Blue) and the TV game show Jeopardy! (Watson). The main engine behind AlphaGo combines machine learning approaches in the form of deep neural networks with a technique called Monte Carlo tree search, whose roots can be traced back to an adaptive multistage sampling simulation-based algorithm for Markov decision processes (MDPs) published in Operations Research in 2005 [H. S. Chang, M. C. Fu, J. Hu, and S. I. Marcus. An adaptive sampling algorithm for solving Markov decision processes. Operations Research 53(1):126–139, 2005] (and introduced even earlier in 2002). This tutorial describes AlphaGo and the simulation-based MDP algorithm, as well as providing contextual and historical background material for both, and uses simple examples to illustrate the main ideas behind Monte Carlo tree search.

Your Access Options

Login Options

INFORMS Member Login Nonmember Login

Purchase Options
Save for later Item saved, go to cart

Tutorials in OR, TutorialsNew $20.00 Add to cart

Tutorials in OR, TutorialsNew Checkout
Other Options
Token Access
- Insert token number
Claim access using a token
Restore guest access
Applies for purchases made as a guest

Download PDF

cover image INFORMS TutORials in Operations Research

Leading Developments from INFORMS Communities

September 2017

Article Information

Metrics

Information

Published Online:October 03, 2017

Cite as

(2017) Markov Decision Processes, AlphaGo, and Monte Carlo Tree Search: Back to the Future. INFORMS TutORials in Operations Research null(null):68-88.

https://doi.org/10.1287/educ.2017.0166

Keywords

PDF download