Open Access

Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

Gen Li
Gen Li
[email protected]
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Search for more papers by this author
,
Changxiao Cai
Changxiao Cai
[email protected]
Department of Biostatistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Search for more papers by this author
,
Yuxin Chen
Yuxin Chen
[email protected]
https://orcid.org/0000-0001-9256-5815
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Search for more papers by this author
,
Yuting Wei
Corresponding Author
Yuting Wei
[email protected]
https://orcid.org/0000-0003-1488-4647
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Search for more papers by this author
,
Yuejie Chi
Yuejie Chi
[email protected]
https://orcid.org/0000-0002-6766-5459
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Search for more papers by this author

Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;

Search for more papers by this author

Changxiao Cai

[email protected]

Department of Biostatistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104;

Search for more papers by this author

Yuxin Chen

[email protected]

https://orcid.org/0000-0001-9256-5815

Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;

Search for more papers by this author

Yuting Wei

Corresponding Author

Yuting Wei

[email protected]

https://orcid.org/0000-0003-1488-4647

Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;

Search for more papers by this author

Yuejie Chi

[email protected]

https://orcid.org/0000-0002-6766-5459

Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

Search for more papers by this author

Published Online:21 Apr 2023https://doi.org/10.1287/opre.2023.2450

Supplemental Material

opre.2023.2450.sm1.pdf

Volume 72, Issue 1

January-February 2024

Pages iii-vi, 1-424, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:November 27, 2021
Accepted:March 16, 2023
Published Online:April 21, 2023

Cite as

Gen Li, Changxiao Cai, Yuxin Chen, Yuting Wei, Yuejie Chi (2023) Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis. Operations Research 72(1):222-236.

https://doi.org/10.1287/opre.2023.2450

Keywords

Acknowledgments

The authors are grateful to Laixi Shi for helpful discussions about the lower bound and thank Shaocong Ma for pointing out some errors in an early version of this work. Part of this work was done when G. Li, Y. Chen, and Y. Wei were visiting the Simons Institute for the Theory of Computing.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

Supplemental Material

Volume 72, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News