Open Access

Global Optimality Guarantees for Policy Gradient Methods

Jalaj Bhandari
Jalaj Bhandari
[email protected]
https://orcid.org/0000-0002-7115-8986
Operations Research, Columbia University, New York, New York 10027;
Search for more papers by this author
,
Daniel Russo
Corresponding Author
Daniel Russo
[email protected]
https://orcid.org/0000-0001-5926-8624
Graduate School of Business, Columbia University, New York, New York 10027
Search for more papers by this author

Jalaj Bhandari

[email protected]

https://orcid.org/0000-0002-7115-8986

Operations Research, Columbia University, New York, New York 10027;

Search for more papers by this author

,

Corresponding Author

Daniel Russo

[email protected]

https://orcid.org/0000-0001-5926-8624

Graduate School of Business, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:5 Jan 2024https://doi.org/10.1287/opre.2021.0014

Supplemental Material

opre.2021.0014.sm1.pdf

cover image Operations Research

Volume 72, Issue 5

September-October 2024

Pages iii-vii, 1751-2261, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:January 08, 2021
Accepted:December 01, 2022
Published Online:January 05, 2024

Copyright © 2024 The Author(s)

Cite as

Jalaj Bhandari, Daniel Russo (2024) Global Optimality Guarantees for Policy Gradient Methods. Operations Research 72(5):1906-1927.

https://doi.org/10.1287/opre.2021.0014

Keywords

Acknowledgments

The authors thank the anonymous referees for feedback that helped improve some aspects of the paper. A part of this work was completed when J. Bhandari was a research fellow at the Theory of Reinforcement learning program at the Simons Institute for the Theory of Computing, University of California, Berkeley. J. Bhandari thanks Peter Bartlett and Simons Institute for that opportunity and Gaurd Iyengar for love and support during the PhD program at Columbia University.