Adaptive Sequential Experiments with Unknown Information Arrival Processes

Yonatan Gur
Corresponding Author
Yonatan Gur
[email protected]
https://orcid.org/0000-0003-0764-3570
Stanford University, Stanford, California 94305
Search for more papers by this author
,
Ahmadreza Momeni
Ahmadreza Momeni
[email protected]
https://orcid.org/0000-0002-0575-7016
Stanford University, Stanford, California 94305
Search for more papers by this author

Yonatan Gur

Corresponding Author

Yonatan Gur

[email protected]

https://orcid.org/0000-0003-0764-3570

Stanford University, Stanford, California 94305

Search for more papers by this author

Ahmadreza Momeni

[email protected]

https://orcid.org/0000-0002-0575-7016

Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:10 Jun 2022https://doi.org/10.1287/msom.2022.1116

References

Adomavicius G, Tuzhilin A (2005) Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowledge Data Engrg. 17(6):734–749.Crossref, Google Scholar
Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Proc. Machine Learning Res. 23:39.1–39.26.Google Scholar
Agrawal S, Goyal N (2013a) Further optimal regret bounds for Thompson sampling. Proc. Machine Learning Res. 31:99–107.Google Scholar
Agrawal S, Goyal N (2013b) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learning Res. 28(3):127–135.Google Scholar
Anderer A, Bastani H, Silberholz J (2022) Adaptive clinical trial designs with surrogates: When should we bother? Management Sci. 68(3):1982–2002.Link, Google Scholar
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2–3):235–256.Crossref, Google Scholar
Azar MG, Lazaric A, Brunskill E (2013) Sequential transfer in multi-armed bandit with finite set of models. Preprint, submitted July 25, https://doi.org/10.48550/arXiv.1307.6887.Google Scholar
Bastani H, Bayati M, Khosravi K (2021a) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar
Bastani H, Simchi-Levi D, Zhu R (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.Link, Google Scholar
Bastani H, Harsha P, Perakis G, Singhvi D (2021b) Learning personalized product recommendations with customer disengagement. Manufacturing Service Oper. Management, ePub ahead of print December 9, https://doi.org/10.1287/msom.2021.1047.Google Scholar
Berry DA (1972) A Bernoulli two-armed bandit. Ann. Math. Statist. 43(3):871–897.Crossref, Google Scholar
Bertsimas D, O’Hair A, Relyea S, Silberholz J (2016) An analytics approach to designing combination chemotherapy regimens for cancer. Management Sci. 62(5):1511–1531.Link, Google Scholar
Besbes O, Muharremoglu A (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2016) Optimization in online content recommendation services: Beyond click-through rates. Manufacturing Service Oper. Management 18(1):15–33.Link, Google Scholar
Besbes O, Gur Y, Zeevi A (2019) Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. Stochastic Systems 9(4):319–337.Link, Google Scholar
Bu J, Simchi-Levi D, Xu Y (2020) Online pricing with offline data: Phase transition and inverse square law. Proc. Machine Learning Res. 119:1202–1210.Google Scholar
Bubeck S, Perchet V, Rigollet P (2013) Bounded regret in stochastic multi-armed bandits. Proc. Machine Learning Res. 30:122–134.Google Scholar
Caro F, Gallien J (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
Caro F, Martínez de Albéniz V (2020) Managing online content to build a follower base: Model and applications. INFORMS J. Optim. 2(1):57–77.Link, Google Scholar
Cesa-Bianchi N, Lugosi G, Stoltz G (2006) Regret minimization under partial monitoring. Math. Oper. Res. 31(3):562–580.Link, Google Scholar
Chu W, Li L, Reyzin L, Schapire RE (2011) Contextual bandits with linear payoff functions. Proc. Machine Learning Res. 15:208–214.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learning Theory, Helsinki, Finland, July 9–12, 355–366.Google Scholar
Degenne R, Garcelon E, Perchet V (2018) Bandits with side observations: Bounded vs. logarithmic regret. Preprint, submitted July 10, https://doi.org/10.48550/arXiv.1807.03558.Google Scholar
Donoho DL (1994) Statistical estimation and optimal recovery. Ann. Statist. 22(1):238–270.Crossref, Google Scholar
Farias VF, Li AA (2019) Learning preferences with side information. Management Sci. 65(7):3131–3149.Link, Google Scholar
Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.Crossref, Google Scholar
Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. Preprint, submitted April 3, https://doi.org/10.48550/arXiv.1904.01763.Google Scholar
Garivier A, Ménard P, Stoltz G (2019) Explore first, exploit next: The true shape of regret in bandit problems. Math. Oper. Res. 44(2):377–399.Link, Google Scholar
Goldenshluger A, Zeevi A (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.Link, Google Scholar
Gur Y, Momeni A (2018) Adaptive learning with unknown information flows. Adv. Neural Inform. Processing Systems 31:7473–7482.Google Scholar
Harrison JM, Keskin NB, Zeevi A (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
Kleinberg R, Leighton T (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 594–605.Google Scholar
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
Madani O, DeCoste D (2005) Contextual recommender problems. Proc. First Internat. Workshop Utility-Based Data Mining (ACM, New York), 86–89.Google Scholar
Pan SJ, Yang Q (2009) A survey on transfer learning. IEEE Trans. Knowledge Data Engrg. 22(10):1345–1359.Crossref, Google Scholar
Pandey S, Agarwal D, Chakrabarti D, Josifovski V (2007) Bandits for taxonomies: A model-based approach. Proc. 2007 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 216–227.Google Scholar
Park S-T, Chu W (2009) Pairwise preference regression for cold-start recommendation. Proc. Third ACM Conf. Recommender Systems (ACM, New York), 21–28.Google Scholar
Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.Crossref, Google Scholar
Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.Crossref, Google Scholar
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Russo DJ, Van Roy B, Kazerouni A, Osband I, Wen Z (2018) A Tutorial on Thompson Sampling (Now, Boston).Crossref, Google Scholar
Schein AI, Popescul A, Ungar LH, Pennock DM (2002) Methods and metrics for cold-start recommendations. Proc. 25th Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 253–260.Google Scholar
Sharma A, Hofman JM, Watts DJ (2015) Estimating the causal impact of recommendation systems from observational data. Proc. 16th ACM Conf. Econom. Comput. (ACM, New York), 453–470.Google Scholar
Shivaswamy P, Joachims T (2012) Multi-armed bandit problems with history. Proc. Machine Learning Res. 22:1046–1054.Google Scholar
Tang L, Jiang Y, Li L, Li T (2014) Ensemble contextual bandits for personalized recommendation. Proc. Eighth ACM Conf. Recommender Systems (ACM, New York), 73–80.Google Scholar
Thompson WR (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
Tsybakov AB (2009) Introduction to Nonparametric Estimation (Springer, New York).Crossref, Google Scholar
Wang L, Wang C, Wang K, He X (2017) BiUCB: A contextual bandit algorithm for cold-start and diversified recommendation. Proc. 2017 IEEE Internat. Conf. Big Knowledge (ICBK) (IEEE, Piscataway, NJ), 248–253.Google Scholar
Woodroofe M (1979) A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74(368):799–806.Crossref, Google Scholar
Zelen M (1969) Play the winner rule and the controlled clinical trial. J. Amer. Statist. Assoc. 64(325):131–146.Crossref, Google Scholar
Zhang J, Farris PW, Irvin JW, Kushwaha T, Steenburgh TJ, Weitz BA (2010) Crafting integrated multichannel retailing strategies. J. Interactive Marketing 24(2):168–180.Crossref, Google Scholar

cover image Manufacturing & Service Operations Management

Volume 24, Issue 5

September-October 2022

Pages 2387-2796, C2

Article Information

Supplemental Material

Metrics

Information

Received:July 19, 2021
Accepted:April 27, 2022
Published Online:June 10, 2022

Cite as

Yonatan Gur, Ahmadreza Momeni (2022) Adaptive Sequential Experiments with Unknown Information Arrival Processes. Manufacturing & Service Operations Management 24(5):2666-2684.

https://doi.org/10.1287/msom.2022.1116

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Adaptive Sequential Experiments with Unknown Information Arrival Processes

References

Volume 24, Issue 5

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News