Generalized Eluder Coefficient: A Unified Framework for Interactive Decision Making in Markov Decision Processes, Partially Observable Markov Decision Processes, and Beyond
Abstract
We study sample-efficient reinforcement learning (RL) under the general framework of interactive decision making, which includes the Markov decision process, partially observable Markov decision process, and predictive state representation (PSR) as special cases. We propose a novel complexity measure, the generalized eluder coefficient (GEC), which characterizes the fundamental trade-off between exploration and exploitation in online interactive decision making in the context of function approximation. We show that RL problems with low GECs form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, Partially observable bilinear (PO-bilinear) class, and generalized regular PSR, where generalized regular PSR, a new tractable PSR class identified by us, includes nearly all known tractable partially observable RL models. Furthermore, in terms of algorithm design, we propose a generic posterior sampling algorithm, which can be implemented in both model-free and model-based fashions, under both fully observable and partially observable settings. We prove that the proposed generic posterior sampling algorithm is sample efficient by establishing a sublinear regret upper bound in terms of the GEC. In summary, we provide a new and unified understanding of both fully observable and partially observable RL.

