The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

Laixi Shi
Corresponding Author
Laixi Shi
[email protected]
https://orcid.org/0000-0003-4038-8620
Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218
Search for more papers by this author
,
Gen Li
Gen Li
[email protected]
Department of Statistics and Data Science, The Chinese University of Hong Kong, Hong Kong
Search for more papers by this author
,
Yuting Wei
Yuting Wei
[email protected]
https://orcid.org/0000-0003-1488-4647
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104
Search for more papers by this author
,
Yuxin Chen
Yuxin Chen
[email protected]
https://orcid.org/0000-0001-9256-5815
Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104
Search for more papers by this author
,
Matthieu Geist
Matthieu Geist
[email protected]
Earth Species Project
Search for more papers by this author
,
Yuejie Chi
Yuejie Chi
[email protected]
https://orcid.org/0000-0002-6766-5459
Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06511
Search for more papers by this author

Laixi Shi

Corresponding Author

Laixi Shi

[email protected]

https://orcid.org/0000-0003-4038-8620

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, Maryland 21218

Search for more papers by this author

Gen Li

[email protected]

Department of Statistics and Data Science, The Chinese University of Hong Kong, Hong Kong

Search for more papers by this author

Yuting Wei

[email protected]

https://orcid.org/0000-0003-1488-4647

Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104

Search for more papers by this author

Yuxin Chen

[email protected]

https://orcid.org/0000-0001-9256-5815

Department of Statistics and Data Science, Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104

Search for more papers by this author

Matthieu Geist

[email protected]

Earth Species Project

Search for more papers by this author

Yuejie Chi

[email protected]

https://orcid.org/0000-0002-6766-5459

Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06511

Search for more papers by this author

Published Online:12 Jun 2026https://doi.org/10.1287/opre.2025.2240

Abstract

This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimizes the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal Markov decision process (MDP). Despite recent efforts, the sample complexity of RMDPs remained mostly unsettled regardless of the uncertainty set in use. It was unclear if distributional robustness bears any statistical consequences when benchmarked against standard RL. Assuming access to a generative model that draws samples based on the nominal MDP, we provide a near-optimal characterization of the sample complexity of RMDPs when the uncertainty set is specified via either the total variation (TV) distance or $χ^{2}$ divergence. The algorithm studied here is a model-based method called distributionally robust value iteration, which is shown to be near-optimal for the full range of uncertainty levels. Somewhat surprisingly, our results uncover that RMDPs are not necessarily easier or harder to learn than standard MDPs. The statistical consequence incurred by the robustness requirement depends heavily on the size and shape of the uncertainty set: in the case with respect to the TV distance, the minimax sample complexity of RMDPs is always smaller than that of standard MDPs; in the case with respect to the $χ^{2}$ divergence, the sample complexity of RMDPs far exceeds the standard MDP counterpart.

Funding: The work of L. Shi and Y. Chi is supported in part by [Grant ONR N00014-19-1-2404], the National Science Foundation [Grant CCF-2106778], [Grant DMS-2134080], and [Grant CNS-2148212]. L. Shi is supported by the Leo Finzi Memorial Fellowship, Wei Shen and Xuehong Zhang Presidential Fellowship, Liang Ji-Dian Graduate Fellowship at Carnegie Mellon University, the Resnick Institute and the California Institute of Technology [Computing, Data, and Society Postdoctoral Fellowship]. G. Li is supported in part by the Chinese University of Hong Kong [Direct Grant for Research] and the Hong Kong Research Grants Council ECS 24305724 and GRF 14307525. The work of Y. Wei is supported in part by the National Science Foundation [Grants DMS-2147546/2015447, CAREER award DMS-2143215, and CCF-2106778] and the Google Research Scholar Award. The work of Y. Chen is supported in part by the Alfred P. Sloan Research Fellowship, the Google Research Scholar Award, the Air Force Office of Scientific Research [Grant FA9550-22-1-0198], the Office of Naval Research [Grant N00014-22-1-2354], and the National Science Foundation [Grants CCF-2221009 and CCF-1907661].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2025.2240.

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:April 12, 2024
Accepted:March 22, 2026
Published Online:June 12, 2026

Cite as

Laixi Shi, Gen Li, Yuting Wei, Yuxin Chen, Matthieu Geist, Yuejie Chi (2026) The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model. Operations Research 0(0).

https://doi.org/10.1287/opre.2025.2240

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model

Abstract

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News