Adaptive Sequential Experiments with Unknown Information Arrival Processes

Published Online:https://doi.org/10.1287/msom.2022.1116

References

  • Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowledge Data Engrg. 17(6):734–749.CrossrefGoogle Scholar
  • Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Proc. Machine Learning Res. 23:39.1–39.26.Google Scholar
  • Agrawal S, Goyal N (2013a) Further optimal regret bounds for Thompson sampling. Proc. Machine Learning Res. 31:99–107.Google Scholar
  • Agrawal S, Goyal N (2013b) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learning Res. 28(3):127–135.Google Scholar
  • Anderer A, Bastani H, Silberholz J (2022) Adaptive clinical trial designs with surrogates: When should we bother? Management Sci. 68(3):1982–2002.LinkGoogle Scholar
  • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2–3):235–256.CrossrefGoogle Scholar
  • Azar MG, Lazaric A, Brunskill E (2013) Sequential transfer in multi-armed bandit with finite set of models. Preprint, submitted July 25, https://doi.org/10.48550/arXiv.1307.6887.Google Scholar
  • Bastani H, Bayati M, Khosravi K (2021a) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.LinkGoogle Scholar
  • Bastani H, Simchi-Levi D, Zhu R (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.LinkGoogle Scholar
  • Bastani H, Harsha P, Perakis G, Singhvi D (2021b) Learning personalized product recommendations with customer disengagement. Manufacturing Service Oper. Management, ePub ahead of print December 9, https://doi.org/10.1287/msom.2021.1047.Google Scholar
  • Berry DA (1972) A Bernoulli two-armed bandit. Ann. Math. Statist. 43(3):871–897.CrossrefGoogle Scholar
  • Bertsimas D, O’Hair A, Relyea S, Silberholz J (2016) An analytics approach to designing combination chemotherapy regimens for cancer. Management Sci. 62(5):1511–1531.LinkGoogle Scholar
  • Besbes O, Muharremoglu A (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2016) Optimization in online content recommendation services: Beyond click-through rates. Manufacturing Service Oper. Management 18(1):15–33.LinkGoogle Scholar
  • Besbes O, Gur Y, Zeevi A (2019) Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. Stochastic Systems 9(4):319–337.LinkGoogle Scholar
  • Bu J, Simchi-Levi D, Xu Y (2020) Online pricing with offline data: Phase transition and inverse square law. Proc. Machine Learning Res. 119:1202–1210.Google Scholar
  • Bubeck S, Perchet V, Rigollet P (2013) Bounded regret in stochastic multi-armed bandits. Proc. Machine Learning Res. 30:122–134.Google Scholar
  • Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.LinkGoogle Scholar
  • Caro F, Martínez de Albéniz V (2020) Managing online content to build a follower base: Model and applications. INFORMS J. Optim. 2(1):57–77.LinkGoogle Scholar
  • Cesa-Bianchi N, Lugosi G, Stoltz G (2006) Regret minimization under partial monitoring. Math. Oper. Res. 31(3):562–580.LinkGoogle Scholar
  • Chu W, Li L, Reyzin L, Schapire RE (2011) Contextual bandits with linear payoff functions. Proc. Machine Learning Res. 15:208–214.Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learning Theory, Helsinki, Finland, July 9–12, 355–366.Google Scholar
  • Degenne R, Garcelon E, Perchet V (2018) Bandits with side observations: Bounded vs. logarithmic regret. Preprint, submitted July 10, https://doi.org/10.48550/arXiv.1807.03558.Google Scholar
  • Donoho DL (1994) Statistical estimation and optimal recovery. Ann. Statist. 22(1):238–270.CrossrefGoogle Scholar
  • Farias VF, Li AA (2019) Learning preferences with side information. Management Sci. 65(7):3131–3149.LinkGoogle Scholar
  • Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.CrossrefGoogle Scholar
  • Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. Preprint, submitted April 3, https://doi.org/10.48550/arXiv.1904.01763.Google Scholar
  • Garivier A, Ménard P, Stoltz G (2019) Explore first, exploit next: The true shape of regret in bandit problems. Math. Oper. Res. 44(2):377–399.LinkGoogle Scholar
  • Goldenshluger A, Zeevi A (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.LinkGoogle Scholar
  • Gur Y, Momeni A (2018) Adaptive learning with unknown information flows. Adv. Neural Inform. Processing Systems 31:7473–7482.Google Scholar
  • Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.LinkGoogle Scholar
  • Kleinberg R, Leighton T (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 594–605.Google Scholar
  • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.CrossrefGoogle Scholar
  • Madani O, DeCoste D (2005) Contextual recommender problems. Proc. First Internat. Workshop Utility-Based Data Mining (ACM, New York), 86–89.Google Scholar
  • Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans. Knowledge Data Engrg. 22(10):1345–1359.CrossrefGoogle Scholar
  • Pandey S, Agarwal D, Chakrabarti D, Josifovski V (2007) Bandits for taxonomies: A model-based approach. Proc. 2007 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 216–227.Google Scholar
  • Park S-T, Chu W (2009) Pairwise preference regression for cold-start recommendation. Proc. Third ACM Conf. Recommender Systems (ACM, New York), 21–28.Google Scholar
  • Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.CrossrefGoogle Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.CrossrefGoogle Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Russo DJ, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A Tutorial on Thompson Sampling (Now, Boston).CrossrefGoogle Scholar
  • Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. Proc. 25th Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 253–260.Google Scholar
  • Sharma A, Hofman JM, Watts DJ (2015) Estimating the causal impact of recommendation systems from observational data. Proc. 16th ACM Conf. Econom. Comput. (ACM, New York), 453–470.Google Scholar
  • Shivaswamy P, Joachims T (2012) Multi-armed bandit problems with history. Proc. Machine Learning Res. 22:1046–1054.Google Scholar
  • Tang L, Jiang Y, Li L, Li T (2014) Ensemble contextual bandits for personalized recommendation. Proc. Eighth ACM Conf. Recommender Systems (ACM, New York), 73–80.Google Scholar
  • Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.CrossrefGoogle Scholar
  • Tsybakov AB (2009) Introduction to Nonparametric Estimation (Springer, New York).CrossrefGoogle Scholar
  • Wang L, Wang C, Wang K, He X (2017) BiUCB: A contextual bandit algorithm for cold-start and diversified recommendation. Proc. 2017 IEEE Internat. Conf. Big Knowledge (ICBK) (IEEE, Piscataway, NJ), 248–253.Google Scholar
  • Woodroofe M (1979) A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74(368):799–806.CrossrefGoogle Scholar
  • Zelen M (1969) Play the winner rule and the controlled clinical trial. J. Amer. Statist. Assoc. 64(325):131–146.CrossrefGoogle Scholar
  • Zhang J, Farris PW, Irvin JW, Kushwaha T, Steenburgh TJ, Weitz BA (2010) Crafting integrated multichannel retailing strategies. J. Interactive Marketing 24(2):168–180.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.