Adaptive Sequential Experiments with Unknown Information Arrival Processes
Published Online:10 Jun 2022https://doi.org/10.1287/msom.2022.1116
References
- (2005) Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowledge Data Engrg. 17(6):734–749.Crossref, Google Scholar
- (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Proc. Machine Learning Res. 23:39.1–39.26.Google Scholar
- (2013a) Further optimal regret bounds for Thompson sampling. Proc. Machine Learning Res. 31:99–107.Google Scholar
- (2013b) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learning Res. 28(3):127–135.Google Scholar
- (2022) Adaptive clinical trial designs with surrogates: When should we bother? Management Sci. 68(3):1982–2002.Link, Google Scholar
- (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2–3):235–256.Crossref, Google Scholar
- (2013) Sequential transfer in multi-armed bandit with finite set of models. Preprint, submitted July 25, https://doi.org/10.48550/arXiv.1307.6887.Google Scholar
- (2021a) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar
- (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.Link, Google Scholar
- (2021b) Learning personalized product recommendations with customer disengagement. Manufacturing Service Oper. Management, ePub ahead of print December 9, https://doi.org/10.1287/msom.2021.1047.Google Scholar
- (1972) A Bernoulli two-armed bandit. Ann. Math. Statist. 43(3):871–897.Crossref, Google Scholar
- (2016) An analytics approach to designing combination chemotherapy regimens for cancer. Management Sci. 62(5):1511–1531.Link, Google Scholar
- (2013) On implications of demand censoring in the newsvendor problem. Management Sci. 59(6):1407–1424.Link, Google Scholar
- (2016) Optimization in online content recommendation services: Beyond click-through rates. Manufacturing Service Oper. Management 18(1):15–33.Link, Google Scholar
- (2019) Optimal exploration-exploitation in a multi-armed-bandit problem with non-stationary rewards. Stochastic Systems 9(4):319–337.Link, Google Scholar
- (2020) Online pricing with offline data: Phase transition and inverse square law. Proc. Machine Learning Res. 119:1202–1210.Google Scholar
- (2013) Bounded regret in stochastic multi-armed bandits. Proc. Machine Learning Res. 30:122–134.Google Scholar
- (2007) Dynamic assortment with demand learning for seasonal consumer goods. Management Sci. 53(2):276–292.Link, Google Scholar
- (2020) Managing online content to build a follower base: Model and applications. INFORMS J. Optim. 2(1):57–77.Link, Google Scholar
- (2006) Regret minimization under partial monitoring. Math. Oper. Res. 31(3):562–580.Link, Google Scholar
- (2011) Contextual bandits with linear payoff functions. Proc. Machine Learning Res. 15:208–214.Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learning Theory, Helsinki, Finland, July 9–12, 355–366.Google Scholar
- (2018) Bandits with side observations: Bounded vs. logarithmic regret. Preprint, submitted July 10, https://doi.org/10.48550/arXiv.1807.03558.Google Scholar
- (1994) Statistical estimation and optimal recovery. Ann. Statist. 22(1):238–270.Crossref, Google Scholar
- (2019) Learning preferences with side information. Management Sci. 65(7):3131–3149.Link, Google Scholar
- (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55(1):119–139.Crossref, Google Scholar
- (2019) Batched multi-armed bandits problem. Preprint, submitted April 3, https://doi.org/10.48550/arXiv.1904.01763.Google Scholar
- (2019) Explore first, exploit next: The true shape of regret in bandit problems. Math. Oper. Res. 44(2):377–399.Link, Google Scholar
- (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.Link, Google Scholar
- (2018) Adaptive learning with unknown information flows. Adv. Neural Inform. Processing Systems 31:7473–7482.Google Scholar
- (2012) Bayesian dynamic pricing policies: Learning and earning under a binary prior distribution. Management Sci. 58(3):570–586.Link, Google Scholar
- (2003) The value of knowing a demand curve: Bounds on regret for online posted-price auctions. Proc. 44th Annual IEEE Sympos. Foundations Comput. Sci. (IEEE, Piscataway, NJ), 594–605.Google Scholar
- (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4–22.Crossref, Google Scholar
- (2005) Contextual recommender problems. Proc. First Internat. Workshop Utility-Based Data Mining (ACM, New York), 86–89.Google Scholar
- (2009) A survey on transfer learning. IEEE Trans. Knowledge Data Engrg. 22(10):1345–1359.Crossref, Google Scholar
- (2007) Bandits for taxonomies: A model-based approach. Proc. 2007 SIAM Internat. Conf. Data Mining (SIAM, Philadelphia), 216–227.Google Scholar
- (2009) Pairwise preference regression for cold-start recommendation. Proc. Third ACM Conf. Recommender Systems (ACM, New York), 21–28.Google Scholar
- (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.Crossref, Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58(5):527–535.Crossref, Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2018) A Tutorial on Thompson Sampling (Now, Boston).Crossref, Google Scholar
- (2002) Methods and metrics for cold-start recommendations. Proc. 25th Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 253–260.Google Scholar
- (2015) Estimating the causal impact of recommendation systems from observational data. Proc. 16th ACM Conf. Econom. Comput. (ACM, New York), 453–470.Google Scholar
- (2012) Multi-armed bandit problems with history. Proc. Machine Learning Res. 22:1046–1054.Google Scholar
- (2014) Ensemble contextual bandits for personalized recommendation. Proc. Eighth ACM Conf. Recommender Systems (ACM, New York), 73–80.Google Scholar
- (1933) On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25(3/4):285–294.Crossref, Google Scholar
- (2009) Introduction to Nonparametric Estimation (Springer, New York).Crossref, Google Scholar
- (2017) BiUCB: A contextual bandit algorithm for cold-start and diversified recommendation. Proc. 2017 IEEE Internat. Conf. Big Knowledge (ICBK) (IEEE, Piscataway, NJ), 248–253.Google Scholar
- (1979) A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74(368):799–806.Crossref, Google Scholar
- (1969) Play the winner rule and the controlled clinical trial. J. Amer. Statist. Assoc. 64(325):131–146.Crossref, Google Scholar
- (2010) Crafting integrated multichannel retailing strategies. J. Interactive Marketing 24(2):168–180.Crossref, Google Scholar

