Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control
- Weichao Mao ,
Weichao Mao
[email protected]https://orcid.org/0000-0001-8301-4173
Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
- Kaiqing Zhang ,
Kaiqing Zhang
[email protected]https://orcid.org/0000-0002-7446-7581
Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, Maryland 20740
- Ruihao Zhu ,
Corresponding Author
Ruihao Zhu
[email protected]https://orcid.org/0000-0003-1463-1308
Cornell SC Johnson College of Business & Nolan School of Hotel Administration, Ithaca, New York 14853
- David Simchi-Levi ,
David Simchi-Levi
[email protected]https://orcid.org/0000-0002-4650-1519
Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering, Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
- Tamer Başar
Tamer Başar
[email protected]https://orcid.org/0000-0003-4406-7875
Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
Weichao Mao
[email protected]https://orcid.org/0000-0001-8301-4173
Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
Kaiqing Zhang
[email protected]https://orcid.org/0000-0002-7446-7581
Department of Electrical and Computer Engineering & Institute for Systems Research, University of Maryland, College Park, Maryland 20740
Corresponding Author
Ruihao Zhu
[email protected]https://orcid.org/0000-0003-1463-1308
Cornell SC Johnson College of Business & Nolan School of Hotel Administration, Ithaca, New York 14853
David Simchi-Levi
[email protected]https://orcid.org/0000-0002-4650-1519
Institute for Data, Systems, and Society, Department of Civil and Environmental Engineering, Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Tamer Başar
[email protected]https://orcid.org/0000-0003-4406-7875
Department of Electrical and Computer Engineering & Coordinated Science Laboratory, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
Supplemental Material
The replication files for this article are available HERE.

