Generalized Eluder Coefficient: A Unified Framework for Interactive Decision Making in Markov Decision Processes, Partially Observable Markov Decision Processes, and Beyond

Published Online:https://doi.org/10.1287/moor.2023.0195

We study sample-efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes the Markov decision process, partially observable Markov decision process, and predictive state representation (PSR) as special cases. We propose a novel complexity measure, the generalized eluder coefficient (GEC), which characterizes the fundamental trade-off between exploration and exploitation in online interactive decision making in the context of function approximation. We show that RL problems with low GECs form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, Partially observable bilinear (PO-bilinear) class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable partially observable RL models. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashions, under both fully observable and partially observable settings. We prove that the proposed generic posterior sampling algorithm is sample efficient by establishing a sublinear regret upper bound in terms of the GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.