The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

Anna Winnicki
Corresponding Author
Anna Winnicki
[email protected]
https://orcid.org/0000-0001-9880-2340
Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois 61801; and Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
Search for more papers by this author
,
Joseph Lubars
Joseph Lubars
[email protected]
https://orcid.org/0000-0001-9273-8456
Sandia National Laboratories, Albuquerque, New Mexico 87123
Search for more papers by this author
,
Michael Livesay
Michael Livesay
[email protected]
https://orcid.org/0000-0002-2594-3772
Sandia National Laboratories, Albuquerque, New Mexico 87123
Search for more papers by this author
,
R. Srikant
R. Srikant
[email protected]
https://orcid.org/0000-0003-1483-5204
Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois 61801; and Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801; and c3.ai Digital Transformation Institute, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
Search for more papers by this author

Anna Winnicki

Corresponding Author

Anna Winnicki

[email protected]

https://orcid.org/0000-0001-9880-2340

Search for more papers by this author

Joseph Lubars

[email protected]

https://orcid.org/0000-0001-9273-8456

Sandia National Laboratories, Albuquerque, New Mexico 87123

Search for more papers by this author

Michael Livesay

[email protected]

https://orcid.org/0000-0002-2594-3772

Sandia National Laboratories, Albuquerque, New Mexico 87123

Search for more papers by this author

R. Srikant

[email protected]

https://orcid.org/0000-0003-1483-5204

Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, Illinois 61801; and Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801; and c3.ai Digital Transformation Institute, University of Illinois Urbana-Champaign, Urbana, Illinois 61801

Search for more papers by this author

Published Online:30 May 2024https://doi.org/10.1287/opre.2022.0357

Abstract

Function approximation is widely used in reinforcement learning to handle the computational difficulties associated with very large state spaces. However, function approximation introduces errors that may lead to instabilities when using approximate dynamic programming techniques to obtain the optimal policy. Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation. We quantitatively characterize the impact of lookahead and m-step rollout on the performance of approximate dynamic programming (DP) with function approximation. (i) Without a sufficient combination of lookahead and m-step rollout, approximate DP may not converge. (ii) Both lookahead and m-step rollout improve the convergence rate of approximate DP. (iii) Lookahead helps mitigate the effect of function approximation and the discount factor on the asymptotic performance of the algorithm. Our results are presented for two approximate DP methods: one that uses least-squares regression to perform function approximation and another that performs several steps of gradient descent of the least-squares objective in each iteration.

Funding: The research presented here was supported in part by a grant from Sandia National Labs and the NSF [Grants CCF 1934986, CCF 2207547, CNS 2106801], ONR [Grant N00014-19-1-2566], and ARO [Grant W911NF-19-1-0379].

Volume 73, Issue 1

January-February 2025

Pages iii-vii, 1-582, C2-C3

Article Information

Metrics

Information

Received:July 12, 2022
Accepted:February 21, 2024
Published Online:May 30, 2024

Cite as

Anna Winnicki; , Joseph Lubars; , Michael Livesay; , R. Srikant (2024) The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation. Operations Research 73(1):139-156.

https://doi.org/10.1287/opre.2022.0357

Keywords

Acknowledgments

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology & Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration [Contract DE-NA0003525]. This paper describes objective technical results and analysis. Any subjective views or opinions that might be expressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the U.S. Government.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation

Abstract

Volume 73, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News