Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Colt (Citeseer), 263–274.Google Scholar
Agarwal A, Agarwal S, Assadi S, Khanna S (2017) Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. Conf. Learn. Theory (PMLR, New York), 39–75.Google Scholar
Audibert JY, Tsybakov AB (2007) Fast learning rates for plug-in classifiers. Ann. Statist. 35(2):608–633.Crossref, Google Scholar
Audibert JY, Bubeck S, Lugosi G (2014) Regret in online combinatorial optimization. Math. Oper. Res. 39(1):31–45.Link, Google Scholar
Auer P, Gajane P, Ortner R (2019) Adaptively tracking the best bandit arm with an unknown number of distribution changes. Conf. Learn. Theory (PMLR, New York), 138–158.Google Scholar
Awerbuch B, Kleinberg RD (2004) Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. Proc. Thirty-Sixth Annual ACM Symposium Theory Comput. (Association for Computing Machinery, New York), 45–53.Google Scholar
Bernstein F, Modaresi S, Sauré D (2019) A dynamic clustering approach to data-driven assortment personalization. Management Sci. 65(5):2095–2115.Abstract, Google Scholar
Berry DA, Chen RW, Zame A, Heath DC, Shepp LA (1997) Bandit problems with infinitely many arms. Ann. Statist. 25(5):2103–2116.Crossref, Google Scholar
Besbes O, Zeevi A (2011) On the minimax complexity of pricing in a changing environment. Oper. Res. 59(1):66–79.Google Scholar
Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Adv. Neural Inform. Processing Systems 27.Google Scholar
Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.Link, Google Scholar
Chakrabarti D, Kumar R, Radlinski F, Upfal E (2008) Mortal multi-armed bandits. Adv. Neural Inform. Processing Systems 21:273–280.Google Scholar
Chen Y, Lee CW, Luo H, Wei CY (2019) A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. Conf. Learn. Theory (PMLR, New York), 696–726.Google Scholar
Cui R, Zhang DJ, Bassamboo A (2019) Learning from inventory availability information: Evidence from field experiments on amazon. Management Sci. 65(3):1216–1235.Link, Google Scholar
Dani V, Hayes TP (2006) Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. Soda 6:937–943.Google Scholar
Dani V, Kakade SM, Hayes T (2007) The price of bandit information for online optimization. Adv. Neural Inform. Processing Systems 20.Google Scholar
Dong S, Van Roy B (2018) An information-theoretic analysis for Thompson sampling with many actions. Adv. Neural Inform. Processing Systems 31.Google Scholar
Farias VF, Madan R (2011) The irrevocable multiarmed bandit problem. Oper. Res. 59(2):383–399.Link, Google Scholar
Feldman J, Zhang DJ, Liu X, Zhang N (2022) Customer choice models vs. machine learning: Finding optimal product displays on Alibaba. Oper. Res. 70(1):309–328.Link, Google Scholar
Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB (2015) Fundamentals of Clinical Trials (Springer, New York).Crossref, Google Scholar
Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. Adv. Neural Inform. Processing Systems 32.Google Scholar
Gauthier CS, Gaudel R, Fromont E (2022) Unirank: Unimodal bandit algorithms for online ranking. Internat. Conf. Machine Learn. (PMLR, New York), 7279–7309.Google Scholar
Ghosal S (2001) Convergence rates for density estimation with Bernstein polynomials. Ann. Statist. 29(5):1264–1280.Crossref, Google Scholar
Greene WH (2003) Econometric Analysis (Pearson Education India, India).Google Scholar
György A, Linder T, Lugosi G, Ottucsák G (2007) The on-line shortest path problem under partial monitoring. J. Machine Learn. Res. 8(10):2369–2403.Google Scholar
Ito S (2021) Hybrid regret bounds for combinatorial semi-bandits and adversarial linear bandits. Adv. Neural Inform. Processing Systems 34:2654–2667.Google Scholar
Jia S, Kallus N, Yu CL (2023a) Clustered switchback experiments: Near-optimal rates under spatiotemporal interference. Preprint, submitted December 25, https://arxiv.org/html/2312.15574v2.Google Scholar
Jia S, Xie Q, Kallus N, Frazier PI (2023b) Smooth non-stationary bandits. Internat. Conf. Machine Learn. (PMLR, New York), 14930–14944.Google Scholar
Jun KS, Jamieson K, Nowak R, Zhu X (2016) Top arm identification in multi-armed bandits with batch arm pulls. Artificial Intelligence Statist. (PMLR, New York), 139–148.Google Scholar
Kadakia PM (2023) Inmobi’s glance launches in japan, aims for 40 percent of android market. Forbes India (August 28). https://www.forbesindia.com/article/take-one-big-story-of-the-day/inmobis-glance-launches-in-japan-aims-for-40-percent-of-android-market/87801/1.Google Scholar
Kallus N, Udell M (2020) Dynamic assortment personalization in high dimensions. Oper. Res. 68(4):1020–1037.Link, Google Scholar
Keskin NB, Zeevi A (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Google Scholar
Keskin NB, Min X, Song JSJ (2021) The nonstationary newsvendor: Data-driven nonparametric learning. Preprint, submitted June 15, https://doi.org/10.2139/ssrn.3866171Google Scholar
Kohavi R, Longbotham R (2017) Online controlled experiments and a/b testing. Encyclopedia Machine Learn. Data Mining 7(8):922–929.Crossref, Google Scholar
Komiyama J, Honda J, Nakagawa H (2015) Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. Internat. Conf. Machine Learn. (PMLR, New York), 1152–1161.Google Scholar
Kveton B, Wen Z, Ashkan A, Szepesvari C (2015) Tight regret bounds for stochastic combinatorial semi-bandits. Artificial Intelligence Statist. (PMLR, New York), 535–543.Google Scholar
Lagrée P, Vernade C, Cappe O (2016) Multiple-play bandits in the position-based model. Adv. Neural Inform. Processing Systems 29.Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Levine N, Crammer K, Mannor S (2017) Rotting bandits. Adv. Neural Inform. Processing Systems 30.Google Scholar
Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 661–670.Google Scholar
Liu Y, Li L (2021) A map of bandits for e-commerce. Preprint, submitted July 1, https://arxiv.org/abs/2107.00680.Google Scholar
Luo H, Wei CY, Agarwal A, Langford J (2018) Efficient contextual bandits in non-stationary worlds. Conf. Learn. Theory (PMLR, New York), 1739–1776.Google Scholar
Mao J, Bojinov I (2021) Quantifying the value of iterative experimentation. Preprint, submitted November 3, https://arxiv.org/abs/2111.02334.Google Scholar
McFarland C (2012) Experiment!: Website Conversion Rate Optimization with A/B and Multivariate Testing (New Riders, Berkeley, CA).Google Scholar
McMahan HB, Blum A (2004) Online geometric optimization in the bandit setting against an adaptive adversary. Learn. Theory: 17th Annual Conf. Learning Theory, COLT 2004, Banff, Canada, July 1–4, 2004. Proc., vol. 17 (Springer, Berlin, Heidelberg), 109–123. Google Scholar
Neu G (2015) First-order regret bounds for combinatorial semi-bandits. Conf. Learn. Theory (PMLR, New York), 1360–1375.Google Scholar
Neu G, Bartók G (2016) Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits. J. Machine Learn. Res. 17(154):1–21.Google Scholar
Oli N, Patel A, Sharma V, Dacharaju SD, Ikhar S (2020) Personalizing multi-modal content for a diverse audience: A scalable deep learning approach. Accessed August 25, 2020, https://irsworkshop.github.io/2020/publications/paper_18_Oli_MultiModal.pdf.Google Scholar
Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.Crossref, Google Scholar
Petrone S, Wasserman L (2002) Consistency of Bernstein polynomial posteriors. J. Roy. Statist. Soc. Ser. B (Statist. Methodology) 64(1):79–100.Crossref, Google Scholar
Pocock SJ (2013) Clinical Trials: A Practical Approach (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Radlinski F, Kleinberg R, Joachims T (2008) Learning diverse rankings with multi-armed bandits. Proc. 25th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 784–791.Google Scholar
Sankararaman KA, Slivkins A (2018) Combinatorial semi-bandits with knapsacks. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1760–1770.Google Scholar
Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.Link, Google Scholar
Seznec J, Locatelli A, Carpentier A, Lazaric A, Valko M (2019) Rotting bandits are no harder than stochastic ones. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2564–2572.Google Scholar
Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.Crossref, Google Scholar
Terwiesch C, Olivares M, Staats BR, Gaur V (2020) OM Forum—A review of empirical operations management over the last two decades. Manufacturing Service Oper. Management 22(4):656–668.Link, Google Scholar
Thomke SH (2020) Experimentation Works: The Surprising Power of Business Experiments (Harvard Business Press, Brighton, MA).Google Scholar
Vuleta B (2021) How much data is created every day? https://seedscientific.com/how-much-data-is-created-every-day/.Google Scholar
Wang Y, Audibert JY, Munos R (2008) Algorithms for infinitely many-armed bandits. Adv. Neural Inform. Processing Systems 21.Google Scholar
Wei CY, Luo H (2018) More adaptive algorithms for adversarial bandits. Conf. Learn. Theory (PMLR, New York), 1263–1291.Google Scholar
Xiong R, Athey S, Bayati M, Imbens G (2019) Optimal experimental design for staggered rollouts. Preprint, submitted November 9, https://doi.org/10.2139/ssrn.3483934.Google Scholar
Xu Y, Duan W, Huang S (2018) SQR: Balancing speed, quality and risk in online experiments. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 895–904.Google Scholar
Xu Y, Chen N, Fernandez A, Sinno O, Bhasin A (2015) From infrastructure to culture: A/b testing challenges in large scale social networks. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 2227–2236.Google Scholar
Yang F, Ramdas A, Jamieson KG, Wainwright MJ (2017) A framework for multi-a(rmed)/b(andit) testing with online FDR control. Adv. Neural Inform. Processing Systems 30.Google Scholar
Zeng Z, Dai H, Zhang DJ, Zhang H, Zhang R, Xu Z, Shen ZJM (2022) The impact of social nudges on user-generated content for social network platforms. Management Sci. 69(9):5189–5208.Link, Google Scholar
Zhang X, Frazier PI (2021) Restless bandits with many arms: Beating the central limit theorem. Preprint, submitted July 25, https://arxiv.org/abs/2107.11911.Google Scholar
Zhang DJ, Dai H, Dong L, Qi F, Zhang N, Liu X, Liu Z, Yang J (2020) The long-term and spillover effects of price promotions on retailing platforms: Evidence from a large randomized experiment on Alibaba. Management Sci. 66(6):2589–2609.Link, Google Scholar
Zimmert J, Luo H, Wei CY (2019) Beating stochastic and adversarial semi-bandits optimally and simultaneously. Internat. Conf. Machine Learn. (PMLR, New York), 7683–7692.Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:October 15, 2023
Accepted:February 22, 2026
Published Online:April 21, 2026

Cite as

Su Jia, Andrew Li, R. Ravi, Nishant Oli, Paul Duff, Ian Anderson (2026) Short-Lived High-Volume Bandits. Operations Research 0(0).

https://doi.org/10.1287/opre.2023.0557

Keywords

Acknowledgments

The authors thank Sai Dinesh Dacharaju, Farhat Habib, and Alan L. Montgomery for helpful discussions. A preliminary version of this work appeared in ICML 2023. This work substantially expands the proceedings version by including (i) proof sketches for the results, (ii) new theoretical results based on discussions from the ICML presentation, (iii) additional analysis of the field experiment on engaged users, and (iv) offline simulations using real-world data.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Short-Lived High-Volume Bandits

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News