Bayesian Exploration for Approximate Dynamic Programming

Ilya O. Ryzhov
Corresponding Author
Ilya O. Ryzhov
http://orcid.org/0000-0002-4191-084X
Robert H. Smith School of Business, University of Maryland, College Park, Maryland 20742;Institute for Systems Research, A. James Clark School of Engineering, University of Maryland, College Park, Maryland 20742;
Search for more papers by this author
,
Martijn R. K. Mes
Martijn R. K. Mes
Industrial Engineering and Business Information Systems, University of Twente, 7500 AE Enschede, Netherlands;
Search for more papers by this author
,
Warren B. Powell
Warren B. Powell
Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author
,
Gerald van den Berg
Gerald van den Berg
Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author

Corresponding Author

Ilya O. Ryzhov

Robert H. Smith School of Business, University of Maryland, College Park, Maryland 20742;Institute for Systems Research, A. James Clark School of Engineering, University of Maryland, College Park, Maryland 20742;

Search for more papers by this author

Martijn R. K. Mes

Industrial Engineering and Business Information Systems, University of Twente, 7500 AE Enschede, Netherlands;

Search for more papers by this author

Warren B. Powell

Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Gerald van den Berg

Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Published Online:18 Jan 2019https://doi.org/10.1287/opre.2018.1772

Abstract

Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other domains. We propose a new approach to the exploration/exploitation dilemma in ADP that leverages two important concepts from the optimal learning literature: first, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP; second, we develop a new exploration strategy based on the concept of value of information and prove that it systematically explores the state space. An important advantage of our framework is that it can be integrated into both parametric and nonparametric value function approximations, which are widely used in practical implementations of ADP. We evaluate this strategy on a variety of distinct resource allocation problems and demonstrate that, although more computationally intensive, it is highly competitive against other exploration strategies.

The e-companion is available at https://doi.org/10.1287/opre.2018.1772.

Volume 67, Issue 1

January-February 2019

Pages ii-iv, 1-294

Article Information

Supplemental Material

Metrics

Information

Received:July 22, 2015
Accepted:May 10, 2018
Published Online:January 18, 2019

Cite as

Ilya O. Ryzhov, Martijn R. K. Mes, Warren B. Powell, Gerald van den Berg (2019) Bayesian Exploration for Approximate Dynamic Programming. Operations Research 67(1):198-214.

https://doi.org/10.1287/opre.2018.1772

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bayesian Exploration for Approximate Dynamic Programming

Abstract

Volume 67, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News