Off-line Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

Imon Banerjee
Corresponding Author
Imon Banerjee
[email protected]
https://orcid.org/0000-0003-2572-3048
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Search for more papers by this author
,
Harsha Honnappa
Harsha Honnappa
[email protected]
https://orcid.org/0000-0002-0834-054X
Edwardson School of Industrial Engineering, Purdue University, West Lafayette, Indiana 47907
Search for more papers by this author
,
Vinayak Rao
Vinayak Rao
[email protected]
https://orcid.org/0000-0002-6249-2923
Department of Statistics, Purdue University, West Lafayette, Indiana 47907
Search for more papers by this author

Imon Banerjee

Corresponding Author

Imon Banerjee

[email protected]

https://orcid.org/0000-0003-2572-3048

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208

Search for more papers by this author

Harsha Honnappa

[email protected]

https://orcid.org/0000-0002-0834-054X

Edwardson School of Industrial Engineering, Purdue University, West Lafayette, Indiana 47907

Search for more papers by this author

Vinayak Rao

[email protected]

https://orcid.org/0000-0002-6249-2923

Department of Statistics, Purdue University, West Lafayette, Indiana 47907

Search for more papers by this author

Published Online:21 Feb 2025https://doi.org/10.1287/opre.2023.0046

Abstract

In this work, we study a natural nonparametric estimator of the transition probability matrices of a finite controlled Markov chain. We consider an off-line setting with a fixed data set of size m, collected using a so-called logging policy. We develop sample complexity bounds for the estimator and establish conditions for minimaxity. Our statistical bounds depend on the logging policy through its mixing properties. We show that achieving a particular statistical risk bound involves a subtle and interesting trade-off between the strength of the mixing properties and the number of samples. We demonstrate the validity of our results under various examples, such as ergodic Markov chains; weakly ergodic inhomogeneous Markov chains; and controlled Markov chains with nonstationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for off-line evaluation of stationary Markov control policies.

Funding: I. Banerjee was supported in part by the Ross-Lynn fellowship and McLean scholarship at Purdue University. H. Honnappa was partly supported by the National Science Foundation [Grants CAREER/2143752, DMS/1812197 and DMS/2153915]. V. Rao was supported by the National Science Foundation [Grants RI/1816499 and DMS/1812197].

Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2023.0046.

Volume 73, Issue 4

July-August 2025

Pages iii-viii, 1723-2295, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:January 29, 2023
Accepted:November 26, 2024
Published Online:February 21, 2025

Cite as

Imon Banerjee, Harsha Honnappa, Vinayak Rao (2025) Off-line Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity. Operations Research 73(4):2281-2295.

https://doi.org/10.1287/opre.2023.0046

Keywords

Acknowledgments

I. Banerjee thanks Anamitra Chaudhuri for numerous insightful discussions and comments throughout the duration of this project. The authors thank the anonymous reviewers for their insightful comments and especially for pointing them toward the interesting application detailed in Section 4.5. Work was completed while Imon Banerjee was at the Department of Statistics, Purdue University.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Off-line Estimation of Controlled Markov Chains: Minimaxity and Sample Complexity

Abstract

Volume 73, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News