Corruption-Robust Exploration in Episodic Reinforcement Learning

Thodoris Lykouris
Corresponding Author
Thodoris Lykouris
[email protected]
https://orcid.org/0000-0002-3375-5579
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Max Simchowitz
Max Simchowitz
[email protected]
https://orcid.org/0000-0001-9900-1238
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author
,
Aleksandrs Slivkins
Aleksandrs Slivkins
[email protected]
https://orcid.org/0000-0001-6899-6383
Microsoft Research Lab, New York, New York 10012
Search for more papers by this author
,
Wen Sun
Wen Sun
[email protected]
https://orcid.org/0000-0003-4322-5878
Department of Computer Science, Cornell University, Ithaca, New York 14850
Search for more papers by this author

Thodoris Lykouris

Corresponding Author

Thodoris Lykouris

[email protected]

https://orcid.org/0000-0002-3375-5579

Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

,

Max Simchowitz

[email protected]

https://orcid.org/0000-0001-9900-1238

Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

,

Aleksandrs Slivkins

Aleksandrs Slivkins

[email protected]

https://orcid.org/0000-0001-6899-6383

Microsoft Research Lab, New York, New York 10012

Search for more papers by this author

,

Wen Sun

[email protected]

https://orcid.org/0000-0003-4322-5878

Department of Computer Science, Cornell University, Ithaca, New York 14850

Search for more papers by this author

Published Online:23 May 2024https://doi.org/10.1287/moor.2021.0202

Supplemental Material

moor.2021.0202.sm1.pdf

cover image Mathematics of Operations Research

Volume 50, Issue 2

May 2025

Pages iii, 783-1583

Article Information

Supplemental Material

Metrics

Information

Received:August 15, 2021
Accepted:October 08, 2023
Published Online:May 23, 2024

Copyright © 2024, INFORMS

Cite as

Thodoris Lykouris; , Max Simchowitz; , Aleksandrs Slivkins; , Wen Sun (2024) Corruption-Robust Exploration in Episodic Reinforcement Learning. Mathematics of Operations Research 50(2):1277-1304.

https://doi.org/10.1287/moor.2021.0202

Keywords

Acknowledgments

The authors thank Christina Lee Yu, Sean Sinclair, and Éva Tardos for useful discussions that helped improve the presentation of this paper, as well as the anonymous review teams at the 34th Annual Conference on Learning Theory and Mathematics of Operations Research for their valuable feedback. A previous version of this paper appeared on arXiv.org in November 2019, focusing on tabular RL (Theorem 2.1). The extension to linear MDPs (Theorem 2.2) was added in April 2020. A one-page abstract appeared in the Proceedings of the 34th Annual Conference on Learning Theory. All results were obtained while T. Lykouris and W. Sun were postdocs and M. Simchowitz was an intern at Microsoft Research Lab–New York City.