Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits

Published Online:https://doi.org/10.1287/mnsc.2023.4895

References

  • Abbasi-Yadkori Y (2012) Online learning for linearly parametrized control problems. Ph.D. Dissertation. University of Alberta, CAN. Advisor(s) Csaba Szepesvari.Google Scholar
  • Agrawal S, Goyal N (2013a) Further optimal regret bounds for Thompson sampling. Carvalho CM, Ravikumar P, eds. Artificial Intelligence and Statistics (PMLR, Cambridge, MA), 99–107.Google Scholar
  • Agrawal S, Goyal N (2013b) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. Internat. Conf. on Machine Learn. (PMLR, Cambridge, MA), 127–135.Google Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
  • Ban GY, Rudin C (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.LinkGoogle Scholar
  • Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.LinkGoogle Scholar
  • Bastani H, Bastani O, Kim C (2017) Interpreting predictive models for human-in-the-loop analytics. Preprint, submitted May 23, https://arxiv.org/abs/ 1705.08504.Google Scholar
  • Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.LinkGoogle Scholar
  • Bayati M, Braverman M, Gillam M, Mack KM, Ruiz G, Smith MS, Horvitz E (2014) Data-driven decisions for reducing readmissions for heart failure: General methodology and case study. PLoS One 9(10):e109264.CrossrefGoogle Scholar
  • Belloni A, Chernozhukov V (2011) High dimensional sparse econometric models: An introduction. Inverse Problems and High-Dimensional Estimation (Springer, Berlin), 121–156.CrossrefGoogle Scholar
  • Belloni A, Chernozhukov V, Hansen C (2014) Inference on treatment effects after selection among high-dimensional controls. Rev. Econom. Stud. 81(2):608–650.CrossrefGoogle Scholar
  • Bertsimas D, Mersereau AJ (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.LinkGoogle Scholar
  • Bickel PJ, Ritov Y, Tsybakov AB (2009) Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37(4):1705–1732.CrossrefGoogle Scholar
  • Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Google Scholar
  • Carpentier A, Munos R (2012) Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. Lawrence ND, Girolami M, eds. Artificial Intelligence and Statistics (PMLR, Cambridge, MA), 190–198.Google Scholar
  • Chow SC, Chang M (2008) Adaptive design methods in clinical trials—A review. Orphanet J. Rare Diseases 3(1):1–13.CrossrefGoogle Scholar
  • Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Gordon G, Dunson D, Dudík M, eds. Proc. 14th Internat. Conf. on Artificial Intelligence and Statist. (PMLR, Cambridge, MA), 208–214.Google Scholar
  • Cover TM, Thomas JA (2006) Elements of Information Theory, 2nd ed. (Wiley, New York).Google Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Filippi S, Cappe O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 586–594.Google Scholar
  • Gao Z, Han Y, Ren Z, Zhou Z (2019) Batched multi-armed bandits problem. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 501–511.Google Scholar
  • Goldenshluger A, Zeevi A (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.LinkGoogle Scholar
  • Han Y, Zhou Z, Zhou Z, Blanchet J, Glynn PW, Ye Y (2020) Sequential batch learning in finite-action linear contextual bandits. Preprint, submitted April 14, https://arxiv.org/abs/ 2004.06321.Google Scholar
  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Hopp WJ, Li J, Wang G (2018) Big data and the precision medicine revolution. Production Oper. Management 27(9):1647–1664.CrossrefGoogle Scholar
  • Joachims T, Swaminathan A, Rijke MD (2018) Deep learning with logged bandit feedback. Bengio Y, LeCun Y, eds. Proc. Internat. Conf. on Learn. Representations (OpenReview, Amherst, MA).Google Scholar
  • Kallus N, Zhou A (2018) Confounding-robust policy improvement. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in neural information processing systems 31 (Curran Associates, Inc, Red Hook, NY).Google Scholar
  • Kim ES, Herbst RS, Wistuba II, Lee JJ, Blumenschein GR, Tsao A, Stewart DJ, et al. (2011) The battle trial: Personalizing therapy for lung cancer. Cancer Discovery 1(1):44–53.CrossrefGoogle Scholar
  • Kim GS, Paik MC (2019) Doubly-robust lasso bandit. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 5877–5887.Google Scholar
  • Kitagawa T, Tetenov A (2018) Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86(2):591–616.CrossrefGoogle Scholar
  • Lattimore T, Szepesvári C (2020) Bandit algorithms (Cambridge University Press, Cambridge, UK).Google Scholar
  • Li K, Yang Y, Narisetty NN (2021) Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit. Electronic J. Statist. 15(2):5652–5695.Google Scholar
  • Miao S, Chao X (2019) Fast algorithms for online personalized assortment optimization in a big data regime. Preprint, submitted August 5, https://dx.doi.org/10.2139/ssrn.3432574.Google Scholar
  • Mintz Y, Aswani A, Kaminsky P, Flowers E, Fukuoka Y (2017) Behavioral analytics for myopic agents. Preprint, submitted February 17, https://arxiv.org/abs/ 1702.05496.Google Scholar
  • Mintz Y, Aswani A, Kaminsky P, Flowers E, Fukuoka Y (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res.LinkGoogle Scholar
  • Naik P, Wedel M, Bacon L, Bodapati A, Bradlow E, Kamakura W, Kreulen J, et al. (2008) Challenges and opportunities in high-dimensional choice data analyses. Marketing Lett. 19(3–4):201.CrossrefGoogle Scholar
  • Oh Mh, Iyengar G, Zeevi A (2021) Sparsity-agnostic lasso bandit. Meila M, Zhang Tong, eds. Proc. Internat. Conf. on Machine Learn. (PMLR, Cambridge, MA), 8271–8280.Google Scholar
  • Pallmann P, Bedding AW, Choodari-Oskooei B, Dimairo M, Flight L, Hampson LV, Holmes J, et al. (2018) Adaptive designs in clinical trials: Why use them, and how to run and report them. BMC Medicine 16(1):1–15.CrossrefGoogle Scholar
  • Perchet V, Rigollet P, Chassang S, Snowberg E (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.CrossrefGoogle Scholar
  • Razavian N, Blecker S, Schmidt AM, Smith-McLallen A, Nigam S, Sontag D (2015) Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3(4):277–287.CrossrefGoogle Scholar
  • Rigollet P (2015) 18. s997: High Dimensional Statistics (MIT Open-CourseWare, Cambridge, MA).Google Scholar
  • Rigollet P, Zeevi A (2010) Nonparametric bandits with covariates. Kalai AT, Mohri M, eds. Conference On Learning Theory (Omnipress, Norristown, PA), 54–66.Google Scholar
  • Robbins H (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (New Series) 58(5):527–535.CrossrefGoogle Scholar
  • Rudelson M, Vershynin R (2015) Small ball probabilities for linear images of high-dimensional distributions. Internat. Math. Res. Not. IMRN 2015(19):9594–9617.CrossrefGoogle Scholar
  • Russo D, Van Roy B (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar
  • Schwartz EM, Bradlow ET, Fader PS (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.LinkGoogle Scholar
  • Slivkins A (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.CrossrefGoogle Scholar
  • Swaminathan A, Joachims T (2015) Batch learning from logged bandit feedback through counterfactual risk minimization. J. Machine Learn. Res. 16(52):1731–1755.Google Scholar
  • Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Wang X, Wei M, Yao T (2018) Minimax concave penalized multi-armed bandit model with high-dimensional covariates. Proc. Internat. Conf. on Machine Learn., 5200–5208.Google Scholar
  • Zhao YQ, Zeng D, Laber EB, Song R, Yuan M, Kosorok MR (2014) Doubly robust learning for estimating individualized treatment with censored data. Biometrika 102(1):151–168.CrossrefGoogle Scholar
  • Zhou M, Fukuoka Y, Mintz Y, Goldberg K, Kaminsky P, Flowers E, Aswani A (2018) Evaluating machine learning–based automated personalized daily step goals delivered through a mobile phone app: Randomized controlled trial. JMIR Mhealth Uhealth 6(1):e28.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.