Modern platforms leverage randomized experiments to make informed decisions from a given set of items (treatment arms). As a particularly challenging scenario, these items can (i) arrive in a high volume, with thousands of new items being released per hour, and (ii) have a short lifetime due to their transient nature. We study a Bayesian multiple-play bandit problem that encapsulates the key features of this scenario. In each round, a set of arms arrives. Each arm has a lifetime w and an unknown mean reward. The learner selects a multiset of n arms and receives observable rewards for each play. We aim to minimize the loss due to not knowing the reward rates. We show that if at most $n^{ρ}$ arms arrive per round, then our policy has a $\tilde{O} (n^{- \min {ρ, \frac{1}{2} {(1 + \frac{1}{w})}^{- 1}}})$ loss on a sufficiently large class of prior distributions for the mean rewards. We complement this by showing that all policies suffer an $Ω (n^{- \min {ρ, \frac{1}{2}}})$ loss. We further validate the effectiveness of our policy through a large-scale field experiment on Glance, a content card service platform that faces exactly this challenge. A simple variant of our policy outperformed the current recommender at the time by 4.32% in total duration and 7.48% in total number of click-throughs.

Funding: R. Ravi is supported in part by the U.S. Office of Naval Research [Award Number N00014-21-1-2243] and the Air Force Office of Scientific Research [Award Number FA9550-23-1-0031]. A. Li is in part supported by NSF CAREER [Grant 2238489].

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2023.0557.

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:October 15, 2023
Accepted:February 22, 2026
Published Online:April 21, 2026

Cite as

Su Jia, Andrew Li, R. Ravi, Nishant Oli, Paul Duff, Ian Anderson (2026) Short-Lived High-Volume Bandits. Operations Research 0(0).

https://doi.org/10.1287/opre.2023.0557

Keywords

Acknowledgments

The authors thank Sai Dinesh Dacharaju, Farhat Habib, and Alan L. Montgomery for helpful discussions. A preliminary version of this work appeared in ICML 2023. This work substantially expands the proceedings version by including (i) proof sketches for the results, (ii) new theoretical results based on discussions from the ICML presentation, (iii) additional analysis of the field experiment on engaged users, and (iv) offline simulations using real-world data.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Short-Lived High-Volume Bandits

Abstract

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News