Frozen-State Value Iteration: Faster Reinforcement Learning by Freezing Slow States

Yijia Wang
Yijia Wang
[email protected]
https://orcid.org/0000-0001-8606-4846
University of Pittsburgh, Pittsburgh, Pennsylvania 15260
Search for more papers by this author
,
Daniel R. Jiang
Corresponding Author
Daniel R. Jiang
[email protected]
https://orcid.org/0000-0002-5388-8061
University of Pittsburgh, Pittsburgh, Pennsylvania 15260; and Meta Platforms, Menlo Park, California 94025
Search for more papers by this author

University of Pittsburgh, Pittsburgh, Pennsylvania 15260

Search for more papers by this author

Daniel R. Jiang

Corresponding Author

Daniel R. Jiang

[email protected]

https://orcid.org/0000-0002-5388-8061

University of Pittsburgh, Pittsburgh, Pennsylvania 15260; and Meta Platforms, Menlo Park, California 94025

Search for more papers by this author

Published Online:18 Mar 2026https://doi.org/10.1287/mnsc.2023.00012

Abstract

We study infinite-horizon Markov decision processes (MDPs) with fast–slow structure, in which some state variables evolve rapidly (fast states), whereas others change more gradually (slow states). This structure commonly arises in practice when decisions must be made at high frequencies over long horizons and when slowly changing information still plays a critical role in determining optimal actions. Examples include inventory control under slowly changing demand indicators or dynamic pricing with gradually shifting consumer behavior. Modeling the problem at the natural decision frequency leads to MDPs with discount factors close to one, making them computationally challenging. We propose a novel approximation strategy that freezes slow states during phases of lower level planning and subsequently applies value iteration to an auxiliary upper level MDP that evolves on a slower timescale. Freezing states for short periods of time leads to easier to solve lower level problems, whereas a slower upper level timescale allows for a more favorable discount factor. On the theoretical side, we analyze the regret incurred by our frozen-state approach, and this leads to simple insights on how to trade off regret versus computational cost. Empirically, we benchmark our new frozen-state methods on three domains: (i) inventory control with fixed order costs, (ii) a grid world problem with spatial tasks, and (iii) dynamic pricing with reference price effects. We demonstrate that the new methods produce high-quality policies with significantly less computation, and we show that simply omitting slow states is often a poor heuristic.

This paper was accepted by J. George Shanthikumar, data science.

Funding: This research is based upon work supported by the U.S. National Science Foundation [Grant 1807536].

Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2023.00012.

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:January 02, 2023
Accepted:September 04, 2025
Published Online:March 18, 2026

Cite as

Yijia Wang, Daniel R. Jiang (2026) Frozen-State Value Iteration: Faster Reinforcement Learning by Freezing Slow States. Management Science 0(0).

https://doi.org/10.1287/mnsc.2023.00012

Keywords

Acknowledgments

The authors are grateful to the department editor, associate editor, and three anonymous reviewers for their multiple rounds of careful reading and constructive feedback. Their comments helped us catch important issues and substantially improve the paper, including strengthening the organization, sharpening the intuition and exposition, and clarifying key points throughout.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Frozen-State Value Iteration: Faster Reinforcement Learning by Freezing Slow States

Abstract

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News