Short-Lived High-Volume Bandits

Published Online:https://doi.org/10.1287/opre.2023.0557

References

  • Abernethy JD, Hazan E, Rakhlin A (2008) Competing in the dark: An efficient algorithm for bandit linear optimization. Colt (Citeseer), 263–274.Google Scholar
  • Agarwal A, Agarwal S, Assadi S, Khanna S (2017) Learning with limited rounds of adaptivity: Coin tossing, multi-armed bandits, and ranking from pairwise comparisons. Conf. Learn. Theory (PMLR, New York), 39–75.Google Scholar
  • Audibert JY, Tsybakov AB (2007) Fast learning rates for plug-in classifiers. Ann. Statist. 35(2):608–633.CrossrefGoogle Scholar
  • Audibert JY, Bubeck S, Lugosi G (2014) Regret in online combinatorial optimization. Math. Oper. Res. 39(1):31–45.LinkGoogle Scholar
  • Auer P, Gajane P, Ortner R (2019) Adaptively tracking the best bandit arm with an unknown number of distribution changes. Conf. Learn. Theory (PMLR, New York), 138–158.Google Scholar
  • Awerbuch B, Kleinberg RD (2004) Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. Proc. Thirty-Sixth Annual ACM Symposium Theory Comput. (Association for Computing Machinery, New York), 45–53.Google Scholar
  • Bernstein F, Modaresi S, Sauré D (2019) A dynamic clustering approach to data-driven assortment personalization. Management Sci. 65(5):2095–2115.AbstractGoogle Scholar
  • Berry DA, Chen RW, Zame A, Heath DC, Shepp LA (1997) Bandit problems with infinitely many arms. Ann. Statist. 25(5):2103–2116.CrossrefGoogle Scholar
  • Besbes O, Zeevi A (2011) On the minimax complexity of pricing in a changing environment. Oper. Res. 59(1):66–79.Google Scholar
  • Besbes O, Gur Y, Zeevi A (2014) Stochastic multi-armed-bandit problem with non-stationary rewards. Adv. Neural Inform. Processing Systems 27.Google Scholar
  • Besbes O, Gur Y, Zeevi A (2015) Non-stationary stochastic optimization. Oper. Res. 63(5):1227–1244.LinkGoogle Scholar
  • Chakrabarti D, Kumar R, Radlinski F, Upfal E (2008) Mortal multi-armed bandits. Adv. Neural Inform. Processing Systems 21:273–280.Google Scholar
  • Chen Y, Lee CW, Luo H, Wei CY (2019) A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free. Conf. Learn. Theory (PMLR, New York), 696–726.Google Scholar
  • Cui R, Zhang DJ, Bassamboo A (2019) Learning from inventory availability information: Evidence from field experiments on amazon. Management Sci. 65(3):1216–1235.LinkGoogle Scholar
  • Dani V, Hayes TP (2006) Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary. Soda 6:937–943.Google Scholar
  • Dani V, Kakade SM, Hayes T (2007) The price of bandit information for online optimization. Adv. Neural Inform. Processing Systems 20.Google Scholar
  • Dong S, Van Roy B (2018) An information-theoretic analysis for Thompson sampling with many actions. Adv. Neural Inform. Processing Systems 31.Google Scholar
  • Farias VF, Madan R (2011) The irrevocable multiarmed bandit problem. Oper. Res. 59(2):383–399.LinkGoogle Scholar
  • Feldman J, Zhang DJ, Liu X, Zhang N (2022) Customer choice models vs. machine learning: Finding optimal product displays on Alibaba. Oper. Res. 70(1):309–328.LinkGoogle Scholar
  • Friedman LM, Furberg CD, DeMets DL, Reboussin DM, Granger CB (2015) Fundamentals of Clinical Trials (Springer, New York).CrossrefGoogle Scholar
  • Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. Adv. Neural Inform. Processing Systems 32.Google Scholar
  • Gauthier CS, Gaudel R, Fromont E (2022) Unirank: Unimodal bandit algorithms for online ranking. Internat. Conf. Machine Learn. (PMLR, New York), 7279–7309.Google Scholar
  • Ghosal S (2001) Convergence rates for density estimation with Bernstein polynomials. Ann. Statist. 29(5):1264–1280.CrossrefGoogle Scholar
  • Greene WH (2003) Econometric Analysis (Pearson Education India, India).Google Scholar
  • György A, Linder T, Lugosi G, Ottucsák G (2007) The on-line shortest path problem under partial monitoring. J. Machine Learn. Res. 8(10):2369–2403.Google Scholar
  • Ito S (2021) Hybrid regret bounds for combinatorial semi-bandits and adversarial linear bandits. Adv. Neural Inform. Processing Systems 34:2654–2667.Google Scholar
  • Jia S, Kallus N, Yu CL (2023a) Clustered switchback experiments: Near-optimal rates under spatiotemporal interference. Preprint, submitted December 25, https://arxiv.org/html/2312.15574v2.Google Scholar
  • Jia S, Xie Q, Kallus N, Frazier PI (2023b) Smooth non-stationary bandits. Internat. Conf. Machine Learn. (PMLR, New York), 14930–14944.Google Scholar
  • Jun KS, Jamieson K, Nowak R, Zhu X (2016) Top arm identification in multi-armed bandits with batch arm pulls. Artificial Intelligence Statist. (PMLR, New York), 139–148.Google Scholar
  • Kadakia PM (2023) Inmobi’s glance launches in japan, aims for 40 percent of android market. Forbes India (August 28). https://www.forbesindia.com/article/take-one-big-story-of-the-day/inmobis-glance-launches-in-japan-aims-for-40-percent-of-android-market/87801/1.Google Scholar
  • Kallus N, Udell M (2020) Dynamic assortment personalization in high dimensions. Oper. Res. 68(4):1020–1037.LinkGoogle Scholar
  • Keskin NB, Zeevi A (2017) Chasing demand: Learning and earning in a changing environment. Math. Oper. Res. 42(2):277–307.Google Scholar
  • Keskin NB, Min X, Song JSJ (2021) The nonstationary newsvendor: Data-driven nonparametric learning. Preprint, submitted June 15, https://doi.org/10.2139/ssrn.3866171Google Scholar
  • Kohavi R, Longbotham R (2017) Online controlled experiments and a/b testing. Encyclopedia Machine Learn. Data Mining 7(8):922–929.CrossrefGoogle Scholar
  • Komiyama J, Honda J, Nakagawa H (2015) Optimal regret analysis of Thompson sampling in stochastic multi-armed bandit problem with multiple plays. Internat. Conf. Machine Learn. (PMLR, New York), 1152–1161.Google Scholar
  • Kveton B, Wen Z, Ashkan A, Szepesvari C (2015) Tight regret bounds for stochastic combinatorial semi-bandits. Artificial Intelligence Statist. (PMLR, New York), 535–543.Google Scholar
  • Lagrée P, Vernade C, Cappe O (2016) Multiple-play bandits in the position-based model. Adv. Neural Inform. Processing Systems 29.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Levine N, Crammer K, Mannor S (2017) Rotting bandits. Adv. Neural Inform. Processing Systems 30.Google Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. Proc. 19th Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 661–670.Google Scholar
  • Liu Y, Li L (2021) A map of bandits for e-commerce. Preprint, submitted July 1, https://arxiv.org/abs/2107.00680.Google Scholar
  • Luo H, Wei CY, Agarwal A, Langford J (2018) Efficient contextual bandits in non-stationary worlds. Conf. Learn. Theory (PMLR, New York), 1739–1776.Google Scholar
  • Mao J, Bojinov I (2021) Quantifying the value of iterative experimentation. Preprint, submitted November 3, https://arxiv.org/abs/2111.02334.Google Scholar
  • McFarland C (2012) Experiment!: Website Conversion Rate Optimization with A/B and Multivariate Testing (New Riders, Berkeley, CA).Google Scholar
  • McMahan HB, Blum A (2004) Online geometric optimization in the bandit setting against an adaptive adversary. Learn. Theory: 17th Annual Conf. Learning Theory, COLT 2004, Banff, Canada, July 1–4, 2004. Proc., vol. 17 (Springer, Berlin, Heidelberg), 109–123. Google Scholar
  • Neu G (2015) First-order regret bounds for combinatorial semi-bandits. Conf. Learn. Theory (PMLR, New York), 1360–1375.Google Scholar
  • Neu G, Bartók G (2016) Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits. J. Machine Learn. Res. 17(154):1–21.Google Scholar
  • Oli N, Patel A, Sharma V, Dacharaju SD, Ikhar S (2020) Personalizing multi-modal content for a diverse audience: A scalable deep learning approach. Accessed August 25, 2020, https://irsworkshop.github.io/2020/publications/paper_18_Oli_MultiModal.pdf.Google Scholar
  • Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.CrossrefGoogle Scholar
  • Petrone S, Wasserman L (2002) Consistency of Bernstein polynomial posteriors. J. Roy. Statist. Soc. Ser. B (Statist. Methodology) 64(1):79–100.CrossrefGoogle Scholar
  • Pocock SJ (2013) Clinical Trials: A Practical Approach (John Wiley & Sons, Hoboken, NJ).CrossrefGoogle Scholar
  • Radlinski F, Kleinberg R, Joachims T (2008) Learning diverse rankings with multi-armed bandits. Proc. 25th Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 784–791.Google Scholar
  • Sankararaman KA, Slivkins A (2018) Combinatorial semi-bandits with knapsacks. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1760–1770.Google Scholar
  • Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.LinkGoogle Scholar
  • Seznec J, Locatelli A, Carpentier A, Lazaric A, Valko M (2019) Rotting bandits are no harder than stochastic ones. 22nd Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2564–2572.Google Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.CrossrefGoogle Scholar
  • Terwiesch C, Olivares M, Staats BR, Gaur V (2020) OM Forum—A review of empirical operations management over the last two decades. Manufacturing Service Oper. Management 22(4):656–668.LinkGoogle Scholar
  • Thomke SH (2020) Experimentation Works: The Surprising Power of Business Experiments (Harvard Business Press, Brighton, MA).Google Scholar
  • Vuleta B (2021) How much data is created every day? https://seedscientific.com/how-much-data-is-created-every-day/.Google Scholar
  • Wang Y, Audibert JY, Munos R (2008) Algorithms for infinitely many-armed bandits. Adv. Neural Inform. Processing Systems 21.Google Scholar
  • Wei CY, Luo H (2018) More adaptive algorithms for adversarial bandits. Conf. Learn. Theory (PMLR, New York), 1263–1291.Google Scholar
  • Xiong R, Athey S, Bayati M, Imbens G (2019) Optimal experimental design for staggered rollouts. Preprint, submitted November 9, https://doi.org/10.2139/ssrn.3483934.Google Scholar
  • Xu Y, Duan W, Huang S (2018) SQR: Balancing speed, quality and risk in online experiments. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 895–904.Google Scholar
  • Xu Y, Chen N, Fernandez A, Sinno O, Bhasin A (2015) From infrastructure to culture: A/b testing challenges in large scale social networks. Proc. 21th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 2227–2236.Google Scholar
  • Yang F, Ramdas A, Jamieson KG, Wainwright MJ (2017) A framework for multi-a(rmed)/b(andit) testing with online FDR control. Adv. Neural Inform. Processing Systems 30.Google Scholar
  • Zeng Z, Dai H, Zhang DJ, Zhang H, Zhang R, Xu Z, Shen ZJM (2022) The impact of social nudges on user-generated content for social network platforms. Management Sci. 69(9):5189–5208.LinkGoogle Scholar
  • Zhang X, Frazier PI (2021) Restless bandits with many arms: Beating the central limit theorem. Preprint, submitted July 25, https://arxiv.org/abs/2107.11911.Google Scholar
  • Zhang DJ, Dai H, Dong L, Qi F, Zhang N, Liu X, Liu Z, Yang J (2020) The long-term and spillover effects of price promotions on retailing platforms: Evidence from a large randomized experiment on Alibaba. Management Sci. 66(6):2589–2609.LinkGoogle Scholar
  • Zimmert J, Luo H, Wei CY (2019) Beating stochastic and adversarial semi-bandits optimally and simultaneously. Internat. Conf. Machine Learn. (PMLR, New York), 7683–7692.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.