Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors

Weidong Han
Corresponding Author
Weidong Han
[email protected]
https://orcid.org/0000-0003-2217-7125
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author
,
Warren B. Powell
Corresponding Author
Warren B. Powell
[email protected]
Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540
Search for more papers by this author

Weidong Han

Corresponding Author

Weidong Han

[email protected]

https://orcid.org/0000-0003-2217-7125

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Warren B. Powell

Corresponding Author

Warren B. Powell

[email protected]

Department of Operations Research and Financial Engineering, Princeton University, Princeton, New Jersey 08540

Search for more papers by this author

Published Online:29 May 2020https://doi.org/10.1287/opre.2019.1921

Abstract

We consider an optimal learning problem where we are trying to learn a function that is nonlinear in unknown parameters in an online setting. We formulate the problem as a dynamic program, provide the optimality condition using Bellman’s equation, and propose a multiperiod lookahead policy to overcome the nonconcavity in the value of information. We adopt a sampled belief model, which we refer to as a discrete prior. For an infinite-horizon problem with discounted cumulative rewards, we prove asymptotic convergence properties under the proposed policy, a rare result for online learning. We then demonstrate the approach in three different settings: a health setting where we make medical decisions to maximize healthcare response over time, a dynamic pricing setting where we make pricing decisions to maximize the cumulative revenue, and a clinical pharmacology setting where we make dosage controls to minimize the deviation between actual and target effects.

This article appears in INFORMS Analytics Collections Vol. 16: Advances in Integrating AI & O.R.

Visit this collection for free access to more articles showcasing the depth and breadth of research and applications at the intersection of AI and operations research.

Volume 68, Issue 5

September-October 2020

Pages iii-vi, 1285-1624, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:March 04, 2017
Accepted:July 22, 2019
Published Online:May 29, 2020

Cite as

Weidong Han, Warren B. Powell (2020) Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors. Operations Research 68(5):1538-1556.

https://doi.org/10.1287/opre.2019.1921

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Optimal Online Learning for Nonlinear Belief Models Using Discrete Priors

Abstract

Volume 68, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News