Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

David Simchi-Levi
David Simchi-Levi
[email protected]
https://orcid.org/0000-0002-4650-1519
Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;
Search for more papers by this author
,
Yunzong Xu
Yunzong Xu
[email protected]
https://orcid.org/0000-0002-1682-419X
Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Statistics and Data Science Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Search for more papers by this author

Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Department of Civil and Environmental Engineering and Operations Research Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;

Search for more papers by this author

Yunzong Xu

[email protected]

https://orcid.org/0000-0002-1682-419X

Institute for Data, Systems, and Society, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139;Statistics and Data Science Center, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139

Search for more papers by this author

Published Online:9 Dec 2021https://doi.org/10.1287/moor.2021.1193

Abstract

We consider the general (stochastic) contextual bandit problem under the realizability assumption, that is, the expected reward, as a function of contexts and actions, belongs to a general function class $F$ . We design a fast and simple algorithm that achieves the statistically optimal regret with only $O (\log T)$ calls to an offline regression oracle across all T rounds. The number of oracle calls can be further reduced to $O (\log \log T)$ if T is known in advance. Our results provide the first universal and optimal reduction from contextual bandits to offline regression, solving an important open problem in the contextual bandit literature. A direct consequence of our results is that any advances in offline regression immediately translate to contextual bandits, statistically and computationally. This leads to faster algorithms and improved regret guarantees for broader classes of contextual bandit problems.

cover image Mathematics of Operations Research

Volume 47, Issue 3

August 2022

Pages 1707-2545, C2

Article Information

Metrics

Information

Received:April 11, 2020
Accepted:June 11, 2021
Published Online:December 09, 2021

Cite as

David Simchi-Levi, Yunzong Xu (2021) Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability. Mathematics of Operations Research 47(3):1904-1931.

https://doi.org/10.1287/moor.2021.1193

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Bypassing the Monster: A Faster and Simpler Optimal Algorithm for Contextual Bandits Under Realizability

Abstract

Volume 47, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News