The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Ilya O. Ryzhov
Ilya O. Ryzhov
[email protected]
Robert H. Smith School of Business, University of Maryland, College Park, Maryland 20742
Search for more papers by this author
,
Warren B. Powell
Warren B. Powell
[email protected]
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544
Search for more papers by this author
,
Peter I. Frazier
Peter I. Frazier
[email protected]
Department of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853
Search for more papers by this author

Ilya O. Ryzhov

[email protected]

Robert H. Smith School of Business, University of Maryland, College Park, Maryland 20742

Search for more papers by this author

Warren B. Powell

[email protected]

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08544

Search for more papers by this author

Peter I. Frazier

[email protected]

Department of Operations Research and Information Engineering, Cornell University, Ithaca, New York 14853

Search for more papers by this author

Published Online:1 Feb 2012https://doi.org/10.1287/opre.1110.0999

Abstract

We derive a one-period look-ahead policy for finite- and infinite-horizon online optimal learning problems with Gaussian rewards. Our approach is able to handle the case where our prior beliefs about the rewards are correlated, which is not handled by traditional multiarmed bandit methods. Experiments show that our KG policy performs competitively against the best-known approximation to the optimal policy in the classic bandit problem, and it outperforms many learning policies in the correlated case.

Cited by
- Capacity allocation in a school-based asthma care model for pediatric patients
  15 May 2026 | IISE Transactions on Healthcare Systems Engineering, Vol. 113
- Online Facility Location: Running Stores on Wheels with Spatial Demand Learning
  Junyu Cao,
  Wei Qi,
  Yan Zhang
  2 February 2026 | Manufacturing & Service Operations Management, Vol. 28, No. 3
- Note from the Editor
  Alice E. Smith
  13 February 2026 | INFORMS Journal on Computing, Vol. 38, No. 1
- An online learning-driven fuzzy dynamic risk response decision model for manufacturing supply chain digital transformation projects
  Engineering Applications of Artificial Intelligence, Vol. 163
- Asymptotic randomised control with applications to bandits
  Numerical Algebra, Control and Optimization, Vol. 19, No. <![CDATA[0]]>
- Knowledge Gradient Procedure to Select the Best System Under Pairwise Comparisons
  19 May 2025 | Naval Research Logistics (NRL), Vol. 72, No. 8
- Competitive Pricing Using Model-Based Bandits
  4 February 2025 | Computational Economics, Vol. 66, No. 6
- Value-adaptive clinical trial designs for efficient delivery of publicly funded trials - a discussion of methods, case studies, opportunities and challenges
  5 June 2025 | BMC Medical Research Methodology, Vol. 25, No. 1
- Bayesian Optimisation for Sensor Scheduling and Tracking with Different Acquisition Functions
- Learning Personalized Treatment Strategies with Predictive and Prognostic Covariates in Adaptive Clinical Trials
  Andres Alban,
  Stephen E. Chick,
  Spyros I. Zoumpoulis
  29 October 2025 | Management Science, Vol. 0, No. 0
- Optimal Policy for Inventory Management with Periodic and Controlled Resets
  Yoon Lee,
  Yonatan Mintz,
  Anil Aswani,
  Zuo-Jun Max Shen,
  Cong Yang
  9 June 2025 | Manufacturing & Service Operations Management, Vol. 27, No. 5
- A budget-adaptive allocation rule for optimal computing budget allocation
  European Journal of Operational Research, Vol. 325, No. 2
- Machine learning enhanced metal 3D printing: high throughput optimization and material transfer extensibility
  19 March 2025 | International Journal of Extreme Manufacturing, Vol. 7, No. 4
- Nonstationary A/B Tests: Optimal Variance Reduction, Bias Correction, and Valid Inference
  Yuhang Wu; ,
  Zeyu Zheng; ,
  Guangyu Zhang,
  Zuohua Zhang,
  Chu Wang
  18 September 2024 | Management Science, Vol. 71, No. 6
- A sequential transit network design algorithm with optimal learning under correlated beliefs
  Transportation Research Part E: Logistics and Transportation Review, Vol. 191
- Handling Varied Objectives by Online Decision Making
  24 August 2024
- A Bayesian model for multicriteria sorting problems
  5 September 2023 | IISE Transactions, Vol. 56, No. 7
- Uncertainty Quantification and Exploration for Reinforcement Learning
  Yi Zhu,
  Jing Dong,
  Henry Lam
  2 March 2023 | Operations Research, Vol. 72, No. 4
- A Heuristic Approach to Explore: The Value of Perfect Information
  Shervin Shahrokhi Tehrani,
  Andrew T. Ching
  4 July 2023 | Management Science, Vol. 70, No. 5
- Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
  Artificial Intelligence, Vol. 330
- Approximate information for efficient exploration-exploitation strategies
  10 May 2024 | Physical Review E, Vol. 109, No. 5
- Bayesian Reinforcement Learning With Limited Cognitive Load
  16 April 2024 | Open Mind, Vol. 8
- A behavioral approach to repeated Bayesian security games
  The Annals of Applied Statistics, Vol. 18, No. 1
- Real-time digital twin-based optimization with predictive simulation learning
  7 March 2022 | Journal of Simulation, Vol. 18, No. 1
- Competitive Insurance Pricing Using Model-Based Bandits
  1 January 2024 | SSRN Electronic Journal, Vol. 8
- Online Distributed Relative Positioning Utilizing Multiple Cooperative Autonomous Agents
  2 December 2023 | Journal of Intelligent & Robotic Systems, Vol. 109, No. 4
- The Reward Biased Method: An Optimism based Approach for Reinforcement Learning
- Reinforcement Learning, Bit by Bit
  11 July 2023 | Foundations and Trends® in Machine Learning, Vol. 16, No. 6
- Convergence rate analysis for optimal computing budget allocation algorithms
  Automatica, Vol. 153
- Value-Based Clinical Trials: Selecting Recruitment Rates and Trial Lengths in Different Regulatory Contexts
  Andres Alban,
  Stephen E. Chick,
  Martin Forster,
  20 September 2022 | Management Science, Vol. 69, No. 6
- Simulation Optimization: Discrete Optimization via Simulation
  3 May 2023
- A Classification Method for Ranking and Selection with Covariates
- On the strategic learning of signal associations
  2 September 2022 | Behavioral Ecology, Vol. 33, No. 6
- Adaptive Design of Personalized Dose-Finding Clinical Trials
  Saeid Delshad,
  Amin Khademi
  21 July 2022 | Service Science, Vol. 14, No. 4
- Online Relative Positioning of Autonomous Vehicles Using Signals of Opportunity
  IEEE Transactions on Intelligent Vehicles, Vol. 7, No. 4
- Satisficing in Time-Sensitive Bandit Learning
  Daniel Russo,
  Benjamin Van Roy
  14 March 2022 | Mathematics of Operations Research, Vol. 47, No. 4
- Learning Manipulation Through Information Dissemination
  Jussi Keppo,
  Michael Jong Kim,
  Xinyuan Zhang
  4 February 2022 | Operations Research, Vol. 70, No. 6
- Weakly Supervised Multi-output Regression via Correlated Gaussian Processes
  Seokhyun Chung,
  Raed Al Kontar,
  Zhenke Wu
  11 July 2022 | INFORMS Journal on Data Science, Vol. 1, No. 2
- Robust Sampling Budget Allocation Under Deep Uncertainty
  IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 52, No. 10
- Online Learning with Regularized Knowledge Gradients
  11 May 2022
- Predictive stochastic programming
  31 July 2021 | Computational Management Science, Vol. 19, No. 1
- Non-stationary A/B Tests: Optimal Variance Reduction, Bias Correction, and Valid Inference
  SSRN Electronic Journal, Vol. 57
- Learning Personalized Treatment Strategies with Predictive and Prognostic Covariates in Adaptive Clinical Trials
  1 January 2022 | SSRN Electronic Journal, Vol. 68
- Managing mobile production-inventory systems influenced by a modulation process
  6 July 2021 | Annals of Operations Research, Vol. 304, No. 1-2
- Machine learning in materials science: From explainable predictions to autonomous design
  Computational Materials Science, Vol. 193
- Problem-fluent models for complex decision-making in autonomous materials research
  Computational Materials Science, Vol. 193
- Efficient sampling for decision making in materials discovery*
  Chinese Physics B, Vol. 30, No. 5
- Value-based Clinical Trials: Selecting Recruitment Rates and Trial Lengths in Different Regulatory Contexts
  SSRN Electronic Journal, Vol. 53
- Online Facility Location
  SSRN Electronic Journal, Vol. 33
- Simple Bayesian Algorithms for Best-Arm Identification
  Daniel Russo
  16 April 2020 | Operations Research, Vol. 68, No. 6
- Nonstationary Bandits with Habituation and Recovery Dynamics
  Yonatan Mintz,
  Anil Aswani,
  Philip Kaminsky,
  Elena Flowers,
  Yoshimi Fukuoka
  9 July 2020 | Operations Research, Vol. 68, No. 5
- Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors
  Weidong Han,
  Warren B. Powell
  29 May 2020 | Operations Research, Vol. 68, No. 5
- Learning in Combinatorial Optimization: What and How to Explore
  Sajad Modaresi,
  Denis Sauré,
  Juan Pablo Vielma
  19 June 2020 | Operations Research, Vol. 68, No. 5
- HorizonBlock: Implementation of an Autonomous Counter-Drone System
- A dynamic mobile production capacity and inventory control problem
  6 January 2020 | IISE Transactions, Vol. 52, No. 8
- Variance Regularization in Sequential Bayesian Optimization
  Michael Jong Kim
  14 April 2020 | Mathematics of Operations Research, Vol. 45, No. 3
- Offline Simulation Online Application: A New Framework of Simulation-Based Decision Making
  9 December 2019 | Asia-Pacific Journal of Operational Research, Vol. 36, No. 06
- Identifying efficient solutions via simulation: myopic multi-objective budget allocation for the bi-objective case
  23 August 2019 | OR Spectrum, Vol. 41, No. 3
- Bayesian Exploration for Approximate Dynamic Programming
  Ilya O. Ryzhov,
  Martijn R. K. Mes,
  Warren B. Powell,
  Gerald van den Berg
  18 January 2019 | Operations Research, Vol. 67, No. 1
- A Heuristic Approach to Explore: The Value of Perfect Information
  SSRN Electronic Journal, Vol. 44
- A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model
  Yan Li,
  Kristofer G. Reyes,
  Jorge Vazquez-Anderson,
  Yingfei Wang,
  Lydia M. Contreras,
  Warren B. Powell
  29 November 2018 | INFORMS Journal on Computing, Vol. 30, No. 4
- Scoping the polymer genome: A roadmap for rational polymer dielectrics design and beyond
  Materials Today, Vol. 21, No. 7
- Machine learning: Overview of the recent progresses and implications for the process systems engineering field
  Computers & Chemical Engineering, Vol. 114
- Learning to Optimize via Information-Directed Sampling
  Daniel Russo,
  Benjamin Van Roy
  26 October 2017 | Operations Research, Vol. 66, No. 1
- Machine learning in materials informatics: recent applications and prospects
  13 December 2017 | npj Computational Materials, Vol. 3, No. 1
- Local water storage control for the developing world
- The role of learning on industrial simulation design and analysis
- Crude Selection Integrated with Optimal Refinery Operation by Combining Optimal Learning and Mathematical Programming
  IFAC-PapersOnLine, Vol. 50, No. 1
- ON THE IDENTIFICATION AND MITIGATION OF WEAKNESSES IN THE KNOWLEDGE GRADIENT POLICY FOR MULTI-ARMED BANDITS
  13 September 2016 | Probability in the Engineering and Informational Sciences, Vol. 31, No. 2
- Contextual multi-armed bandit algorithms for personalized learning action selection
- Information Collection Optimization in Designing Marketing Campaigns for Market Entry
  SSRN Electronic Journal, Vol. 23
- On the Convergence Rates of Expected Improvement Methods
  Ilya O. Ryzhov
  18 May 2016 | Operations Research, Vol. 64, No. 6
- Lot-sizing in sequential auctions while learning bid and demand distributions
- A Unified Framework for Optimization Under Uncertainty
  Warren B. Powell
  4 November 2016
- Optimal learning with non-Gaussian rewards
  24 March 2016 | Advances in Applied Probability, Vol. 48, No. 1
- Robust Multiarmed Bandit Problems
  Michael Jong Kim,
  Andrew E.B. Lim
  5 August 2015 | Management Science, Vol. 62, No. 1
- Learning in Combinatorial Optimization: What and How to Explore
  SSRN Electronic Journal, Vol. 33
- Altered Statistical Learning and Decision-Making in Methamphetamine Dependence: Evidence from a Two-Armed Bandit Task
  18 December 2015 | Frontiers in Psychology, Vol. 6
- Correlated Gaussian Multi-Objective Multi-Armed Bandit Across Arms Algorithm
- Discrete optimal Bayesian classification with error-conditioned sequential sampling
  Pattern Recognition, Vol. 48, No. 11
- Optimal learning with a local parametric belief model
  3 April 2015 | Journal of Global Optimization, Vol. 63, No. 2
- Optimal Experimental Design for Gene Regulatory Networks in the Presence of Uncertainty
  IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 12, No. 4
- Quantifying Experimental Characterization Choices in Optimal Learning and Materials Design
  27 February 2015
- Ranking and Selection: Efficient Simulation Budget Allocation
  18 September 2014
- Knowledge Gradient for Online Reinforcement Learning
  1 December 2015
- Learning to Optimize via Posterior Sampling
  Daniel Russo,
  Benjamin Van Roy
  23 April 2014 | Mathematics of Operations Research, Vol. 39, No. 4
- Optimal learning for sequential sampling with non-parametric beliefs
  3 March 2013 | Journal of Global Optimization, Vol. 58, No. 3
- Monte Carlo sampling-based methods for stochastic optimization
  Surveys in Operations Research and Management Science, Vol. 19, No. 1
- Improving Health Outcomes Through Better Capacity Allocation in a Community-Based Chronic Care Model
  Sarang Deo,
  Seyed Iravani,
  Tingting Jiang,
  Karen Smilowitz,
  Stephen Samuelson
  30 December 2013 | Operations Research, Vol. 61, No. 6
- Optimal learning with non-Gaussian rewards
- The knowledge gradient algorithm using locally parametric approximations
- Dynamic decision making for graphical models applied to oil exploration
  European Journal of Operational Research, Vol. 230, No. 3
- Sequential Bayes-Optimal Policies for Multiple Comparisons with a Known Standard
  Jing Xie,
  Peter I. Frazier
  25 October 2013 | Operations Research, Vol. 61, No. 5
- Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats
  David B. Brown,
  James E. Smith,
  24 May 2013 | Operations Research, Vol. 61, No. 3
- A case for a coordinated internet video control plane
  13 August 2012 | ACM SIGCOMM Computer Communication Review, Vol. 42, No. 4
- Bibliography
  25 April 2012
- Information Collection for Linear Programs with Uncertain Objective Coefficients
  SIAM Journal on Optimization, Vol. 22, No. 4
- Improving Access to Community-Based Chronic Care Through Improved Capacity Allocation
  SSRN Electronic Journal, Vol. 50

Volume 60, Issue 1

January-February 2012

Pages iii-248

Article Information

Supplemental Material

Metrics

Information

Received:September 01, 2009
Accepted:June 01, 2011
Published Online:February 01, 2012

Cite as

Ilya O. Ryzhov, Warren B. Powell, Peter I. Frazier, (2012) The Knowledge Gradient Algorithm for a General Class of Online Learning Problems. Operations Research 60(1):180-195.

https://doi.org/10.1287/opre.1110.0999

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The Knowledge Gradient Algorithm for a General Class of Online Learning Problems

Abstract

Volume 60, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News