Uncertainty Quantification and Exploration for Reinforcement Learning

Yi Zhu
Yi Zhu
[email protected]
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208;
Search for more papers by this author
,
Jing Dong
Corresponding Author
Jing Dong
[email protected]
https://orcid.org/0000-0001-6387-4088
Division, Risk and Operations Division, Columbia Business School, New York, New York 10027;
Search for more papers by this author
,
Henry Lam
Henry Lam
[email protected]
https://orcid.org/0000-0002-3193-563X
Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027
Search for more papers by this author

Yi Zhu

[email protected]

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208;

Search for more papers by this author

Jing Dong

Corresponding Author

Jing Dong

[email protected]

https://orcid.org/0000-0001-6387-4088

Division, Risk and Operations Division, Columbia Business School, New York, New York 10027;

Search for more papers by this author

Henry Lam

[email protected]

https://orcid.org/0000-0002-3193-563X

Department of Industrial Engineering and Operations Research, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:2 Mar 2023https://doi.org/10.1287/opre.2023.2436

Abstract

We investigate statistical uncertainty quantification for reinforcement learning (RL) and its implications in exploration policy. Despite ever-growing literature on RL applications, fundamental questions about inference and error quantification, such as large-sample behaviors, appear to remain quite open. In this paper, we fill in the literature gap by studying the central limit theorem behaviors of estimated Q-values and value functions under various RL settings. In particular, we explicitly identify closed-form expressions of the asymptotic variances, which allow us to efficiently construct asymptotically valid confidence regions for key RL quantities. Furthermore, we utilize these asymptotic expressions to design an effective exploration strategy, which we call Q-value-based Optimal Computing Budget Allocation (Q-OCBA). The policy relies on maximizing the relative discrepancies among the Q-value estimates. Numerical experiments show superior performances of our exploration strategy than other benchmark policies.

Funding: This work was supported by the National Science Foundation (1720433).

Volume 72, Issue 4

July-August 2024

Pages iii-vi, 1317-1750, C2-C3

Article Information

Metrics

Information

Received:May 25, 2020
Accepted:November 23, 2022
Published Online:March 02, 2023

Cite as

Yi Zhu, Jing Dong, Henry Lam (2023) Uncertainty Quantification and Exploration for Reinforcement Learning. Operations Research 72(4):1689-1709.

https://doi.org/10.1287/opre.2023.2436

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Uncertainty Quantification and Exploration for Reinforcement Learning

Abstract

Volume 72, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News