Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis
- Gen Li,
Gen Li
[email protected]Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
- Changxiao Cai,
Changxiao Cai
[email protected]Department of Biostatistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
- Yuxin Chen ,
Yuxin Chen
[email protected]https://orcid.org/0000-0001-9256-5815
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
- Yuting Wei ,
Corresponding Author
Yuting Wei
[email protected]https://orcid.org/0000-0003-1488-4647
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
- Yuejie Chi
Yuejie Chi
[email protected]https://orcid.org/0000-0002-6766-5459
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213
Gen Li
[email protected]Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Changxiao Cai
[email protected]Department of Biostatistics, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Yuxin Chen
[email protected]https://orcid.org/0000-0001-9256-5815
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Corresponding Author
Yuting Wei
[email protected]https://orcid.org/0000-0003-1488-4647
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Yuejie Chi
[email protected]https://orcid.org/0000-0002-6766-5459
Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213

