Rate-Optimal Bayesian Simple Regret in Best Arm Identification

Junpei Komiyama
Corresponding Author
Junpei Komiyama
[email protected]
https://orcid.org/0000-0003-0095-6558
Stern School of Business, New York University, New York, New York 10012;
Search for more papers by this author
,
Kaito Ariu
Kaito Ariu
[email protected]
https://orcid.org/0000-0001-6286-9906
Artificial Intelligence Laboratory, CyberAgent, Inc., Shibuya, Tokyo 150-0042, Japan;School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 114 28 Stockholm, Sweden;
Search for more papers by this author
,
Masahiro Kato
Masahiro Kato
[email protected]
Artificial Intelligence Laboratory, CyberAgent, Inc., Shibuya, Tokyo 150-0042, Japan;
Search for more papers by this author
,
Chao Qin
Chao Qin
[email protected]
Columbia Business School, Columbia University, New York, New York 10027
Search for more papers by this author

Junpei Komiyama

Corresponding Author

Junpei Komiyama

[email protected]

https://orcid.org/0000-0003-0095-6558

Stern School of Business, New York University, New York, New York 10012;

Search for more papers by this author

Kaito Ariu

[email protected]

https://orcid.org/0000-0001-6286-9906

Artificial Intelligence Laboratory, CyberAgent, Inc., Shibuya, Tokyo 150-0042, Japan;School of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, 114 28 Stockholm, Sweden;

Search for more papers by this author

Masahiro Kato

[email protected]

Artificial Intelligence Laboratory, CyberAgent, Inc., Shibuya, Tokyo 150-0042, Japan;

Search for more papers by this author

Chao Qin

[email protected]

Columbia Business School, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:25 Aug 2023https://doi.org/10.1287/moor.2022.0011

Abstract

We consider best arm identification in the multiarmed bandit problem. Assuming certain continuity conditions of the prior, we characterize the rate of the Bayesian simple regret. Differing from Bayesian regret minimization, the leading term in the Bayesian simple regret derives from the region in which the gap between optimal and suboptimal arms is smaller than $\sqrt{(\log T) / T}$ . We propose a simple and easy-to-compute algorithm with its leading term matching with the lower bound up to a constant factor; simulation results support our theoretical findings.

cover image Mathematics of Operations Research

Volume 49, Issue 3

August 2024

Pages 1303-2047, C2

Article Information

Metrics

Information

Received:January 07, 2022
Accepted:July 09, 2023
Published Online:August 25, 2023

Cite as

Junpei Komiyama, Kaito Ariu, Masahiro Kato, Chao Qin (2023) Rate-Optimal Bayesian Simple Regret in Best Arm Identification. Mathematics of Operations Research 49(3):1629-1646.

https://doi.org/10.1287/moor.2022.0011

Keywords

Acknowledgments

The authors thank Po-An Wang and Kenshi Abe for several suggestions. The authors thank Assaf Zeevi for discussion on the stability of algorithms.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Rate-Optimal Bayesian Simple Regret in Best Arm Identification

Abstract

Volume 49, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News