A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation

Zaiwei Chen
Corresponding Author
Zaiwei Chen
[email protected]
https://orcid.org/0000-0001-9915-5595
The School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332;
Search for more papers by this author
,
Siva T. Maguluri
Siva T. Maguluri
[email protected]
https://orcid.org/0000-0002-5797-1639
The School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332;
Search for more papers by this author
,
Sanjay Shakkottai
Sanjay Shakkottai
[email protected]
https://orcid.org/0000-0002-4325-9050
Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712;
Search for more papers by this author
,
Karthikeyan Shanmugam
Karthikeyan Shanmugam
[email protected]
https://orcid.org/0009-0008-2879-5868
IBM Research AI Group, Yorktown Heights, New York 10598
Search for more papers by this author

Zaiwei Chen

Corresponding Author

Zaiwei Chen

[email protected]

https://orcid.org/0000-0001-9915-5595

The School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332;

Search for more papers by this author

Siva T. Maguluri

[email protected]

https://orcid.org/0000-0002-5797-1639

The School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332;

Search for more papers by this author

Sanjay Shakkottai

[email protected]

https://orcid.org/0000-0002-4325-9050

Department of Electrical and Computer Engineering, The University of Texas at Austin, Austin, Texas 78712;

Search for more papers by this author

Karthikeyan Shanmugam

[email protected]

https://orcid.org/0009-0008-2879-5868

IBM Research AI Group, Yorktown Heights, New York 10598

Search for more papers by this author

Published Online:6 Oct 2023https://doi.org/10.1287/opre.2022.0249

Abstract

This paper develops a unified Lyapunov framework for finite-sample analysis of a Markovian stochastic approximation (SA) algorithm under a contraction operator with respect to an arbitrary norm. The main novelty lies in the construction of a valid Lyapunov function called the generalized Moreau envelope. The smoothness and an approximation property of the generalized Moreau envelope enable us to derive a one-step Lyapunov drift inequality, which is the key to establishing the finite-sample bounds. Our SA result has wide applications, especially in the context of reinforcement learning (RL). Specifically, we show that a large class of value-based RL algorithms can be modeled in the exact form of our Markovian SA algorithm. Therefore, our SA results immediately imply finite-sample guarantees for popular RL algorithms such as n-step temporal difference (TD) learning, TD $(λ)$ , off-policy V-trace, and Q-learning. As byproducts, by analyzing the convergence bounds of n-step TD and TD $(λ)$ , we provide theoretical insight into the problem about the efficiency of bootstrapping. Moreover, our finite-sample bounds of off-policy V-trace explicitly capture the tradeoff between the variance of the stochastic iterates and the bias in the limit.

Funding: This work was supported by RTX, the National Science Foundation [Grants 2019844, 2107037, 211247, 2112533, 2144316, and 2240982], and the Machine Learning Laboratory at University of Texas at Austin.

Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.0249.

Volume 72, Issue 4

July-August 2024

Pages iii-vi, 1317-1750, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:May 16, 2022
Accepted:August 28, 2023
Published Online:October 06, 2023

Cite as

Zaiwei Chen, Siva T. Maguluri, Sanjay Shakkottai, Karthikeyan Shanmugam (2023) A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation. Operations Research 72(4):1352-1367.

https://doi.org/10.1287/opre.2022.0249

Keywords

Acknowledgments

Z. Chen recently moved to Caltech as a postdoctoral fellow in August 2022. This work was done when Z. Chen was affiliated with Georgia Tech. K. Shanmugam recently moved to Google Research India (Bengaluru) in April 2022. This work was done when K. Shanmugam was affiliated with IBM.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Lyapunov Theory for Finite-Sample Guarantees of Markovian Stochastic Approximation

Abstract

Volume 72, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News