Generalized Eluder Coefficient: A Unified Framework for Interactive Decision Making in Markov Decision Processes, Partially Observable Markov Decision Processes, and Beyond

Han Zhong
Han Zhong
[email protected]
https://orcid.org/0009-0009-5250-620X
Center for Data Science, Peking University, Beijing 100871, China
Search for more papers by this author
,
Wei Xiong
Wei Xiong
[email protected]
Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
Search for more papers by this author
,
Sirui Zheng
Sirui Zheng
[email protected]
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Search for more papers by this author
,
Liwei Wang
Liwei Wang
[email protected]
School of Intelligence Science and Technology, Peking University, Beijing 100871, China
Search for more papers by this author
,
Zhaoran Wang
Corresponding Author
Zhaoran Wang
[email protected]
https://orcid.org/0000-0002-1824-2580
Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208
Search for more papers by this author
,
Zhuoran Yang
Zhuoran Yang
[email protected]
https://orcid.org/0000-0001-5269-9958
Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06511
Search for more papers by this author
,
Tong Zhang
Tong Zhang
[email protected]
https://orcid.org/0000-0002-5511-2558
Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801
Search for more papers by this author

Center for Data Science, Peking University, Beijing 100871, China

Search for more papers by this author

Wei Xiong

[email protected]

Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801

Search for more papers by this author

Sirui Zheng

[email protected]

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208

Search for more papers by this author

Liwei Wang

[email protected]

School of Intelligence Science and Technology, Peking University, Beijing 100871, China

Search for more papers by this author

Zhaoran Wang

Corresponding Author

Zhaoran Wang

[email protected]

https://orcid.org/0000-0002-1824-2580

Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208

Search for more papers by this author

Zhuoran Yang

[email protected]

https://orcid.org/0000-0001-5269-9958

Department of Statistics and Data Science, Yale University, New Haven, Connecticut 06511

Search for more papers by this author

Tong Zhang

[email protected]

https://orcid.org/0000-0002-5511-2558

Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois 61801

Search for more papers by this author

Published Online:15 Jun 2026https://doi.org/10.1287/moor.2023.0195

Abstract

We study sample-efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes the Markov decision process, partially observable Markov decision process, and predictive state representation (PSR) as special cases. We propose a novel complexity measure, the generalized eluder coefficient (GEC), which characterizes the fundamental trade-off between exploration and exploitation in online interactive decision making in the context of function approximation. We show that RL problems with low GECs form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, Partially observable bilinear (PO-bilinear) class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable partially observable RL models. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashions, under both fully observable and partially observable settings. We prove that the proposed generic posterior sampling algorithm is sample efficient by establishing a sublinear regret upper bound in terms of the GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:April 19, 2023
Accepted:May 17, 2025
Published Online:June 15, 2026

Cite as

Han Zhong , Wei Xiong , Sirui Zheng , Liwei Wang , Zhaoran Wang , Zhuoran Yang , Tong Zhang (2026) Generalized Eluder Coefficient: A Unified Framework for Interactive Decision Making in Markov Decision Processes, Partially Observable Markov Decision Processes, and Beyond. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2023.0195

Keywords

Acknowledgments

The authors thank Yu Bai and Song Mei for pointing out a technical issue in the first version of this paper regarding the ℓ2 eluder technique.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Generalized Eluder Coefficient: A Unified Framework for Interactive Decision Making in Markov Decision Processes, Partially Observable Markov Decision Processes, and Beyond

Abstract

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News