Dynamic Batch Learning in High-Dimensional Sparse Linear Contextual Bandits
Published Online:19 Oct 2023https://doi.org/10.1287/mnsc.2023.4895
References
- (2012) Online learning for linearly parametrized control problems. Ph.D. Dissertation. University of Alberta, CAN. Advisor(s) Csaba Szepesvari.Google Scholar
- (2013a) Further optimal regret bounds for Thompson sampling. Carvalho CM, Ravikumar P, eds. Artificial Intelligence and Statistics (PMLR, Cambridge, MA), 99–107.Google Scholar
- (2013b) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. Internat. Conf. on Machine Learn. (PMLR, Cambridge, MA), 127–135.Google Scholar
- (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(Nov):397–422.Google Scholar
- (2019) The big data newsvendor: Practical insights from machine learning. Oper. Res. 67(1):90–108.Link, Google Scholar
- (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.Link, Google Scholar
- (2017) Interpreting predictive models for human-in-the-loop analytics. Preprint, submitted May 23, https://arxiv.org/abs/ 1705.08504.Google Scholar
- (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar
- (2014) Data-driven decisions for reducing readmissions for heart failure: General methodology and case study. PLoS One 9(10):e109264.Crossref, Google Scholar
- (2011) High dimensional sparse econometric models: An introduction. Inverse Problems and High-Dimensional Estimation (Springer, Berlin), 121–156.Crossref, Google Scholar
- (2014) Inference on treatment effects after selection among high-dimensional controls. Rev. Econom. Stud. 81(2):608–650.Crossref, Google Scholar
- (2007) A learning approach for interactive marketing to a customer segment. Oper. Res. 55(6):1120–1135.Link, Google Scholar
- (2009) Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37(4):1705–1732.Crossref, Google Scholar
- (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Google Scholar
- (2012) Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. Lawrence ND, Girolami M, eds. Artificial Intelligence and Statistics (PMLR, Cambridge, MA), 190–198.Google Scholar
- (2008) Adaptive design methods in clinical trials—A review. Orphanet J. Rare Diseases 3(1):1–13.Crossref, Google Scholar
- (2011) Contextual bandits with linear payoff functions. Gordon G, Dunson D, Dudík M, eds. Proc. 14th Internat. Conf. on Artificial Intelligence and Statist. (PMLR, Cambridge, MA), 208–214.Google Scholar
- (2006) Elements of Information Theory, 2nd ed. (Wiley, New York).Google Scholar
- (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2010) Parametric bandits: The generalized linear case. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 586–594.Google Scholar
- (2019) Batched multi-armed bandits problem. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 501–511.Google Scholar
- (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.Link, Google Scholar
- (2020) Sequential batch learning in finite-action linear contextual bandits. Preprint, submitted April 14, https://arxiv.org/abs/ 2004.06321.Google Scholar
- (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- (2018) Big data and the precision medicine revolution. Production Oper. Management 27(9):1647–1664.Crossref, Google Scholar
- (2018) Deep learning with logged bandit feedback. Bengio Y, LeCun Y, eds. Proc. Internat. Conf. on Learn. Representations (OpenReview, Amherst, MA).Google Scholar
- (2018) Confounding-robust policy improvement. Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Advances in neural information processing systems 31 (Curran Associates, Inc, Red Hook, NY).Google Scholar
- (2011) The battle trial: Personalizing therapy for lung cancer. Cancer Discovery 1(1):44–53.Crossref, Google Scholar
- (2019) Doubly-robust lasso bandit. Wallach H, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox E, Garnett R, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, NY), 5877–5887.Google Scholar
- (2018) Who should be treated? Empirical welfare maximization methods for treatment choice. Econometrica 86(2):591–616.Crossref, Google Scholar
- (2020) Bandit algorithms (Cambridge University Press, Cambridge, UK).Google Scholar
- (2021) Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit. Electronic J. Statist. 15(2):5652–5695.Google Scholar
- (2019) Fast algorithms for online personalized assortment optimization in a big data regime. Preprint, submitted August 5, https://dx.doi.org/10.2139/ssrn.3432574.Google Scholar
- (2017) Behavioral analytics for myopic agents. Preprint, submitted February 17, https://arxiv.org/abs/ 1702.05496.Google Scholar
- (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res.Link, Google Scholar
- , et al. (2008) Challenges and opportunities in high-dimensional choice data analyses. Marketing Lett. 19(3–4):201.Crossref, Google Scholar
- (2021) Sparsity-agnostic lasso bandit. Meila M, Zhang Tong, eds. Proc. Internat. Conf. on Machine Learn. (PMLR, Cambridge, MA), 8271–8280.Google Scholar
- (2018) Adaptive designs in clinical trials: Why use them, and how to run and report them. BMC Medicine 16(1):1–15.Crossref, Google Scholar
- (2016) Batched bandit problems. Ann. Statist. 44(2):660–681.Crossref, Google Scholar
- (2015) Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3(4):277–287.Crossref, Google Scholar
- (2015) 18. s997: High Dimensional Statistics (MIT Open-CourseWare, Cambridge, MA).Google Scholar
- (2010) Nonparametric bandits with covariates. Kalai AT, Mohri M, eds. Conference On Learning Theory (Omnipress, Norristown, PA), 54–66.Google Scholar
- (1952) Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (New Series) 58(5):527–535.Crossref, Google Scholar
- (2015) Small ball probabilities for linear images of high-dimensional distributions. Internat. Math. Res. Not. IMRN 2015(19):9594–9617.Crossref, Google Scholar
- (2016) An information-theoretic analysis of Thompson sampling. J. Machine Learn. Res. 17(1):2442–2471.Google Scholar
- (2017) Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Sci. 36(4):500–522.Link, Google Scholar
- (2019) Introduction to multi-armed bandits. Foundations Trends Machine Learn. 12(1–2):1–286.Crossref, Google Scholar
- (2015) Batch learning from logged bandit feedback through counterfactual risk minimization. J. Machine Learn. Res. 16(52):1731–1755.Google Scholar
- (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2018) Minimax concave penalized multi-armed bandit model with high-dimensional covariates. Proc. Internat. Conf. on Machine Learn., 5200–5208.Google Scholar
- (2014) Doubly robust learning for estimating individualized treatment with censored data. Biometrika 102(1):151–168.Crossref, Google Scholar
- (2018) Evaluating machine learning–based automated personalized daily step goals delivered through a mobile phone app: Randomized controlled trial. JMIR Mhealth Uhealth 6(1):e28.Crossref, Google Scholar

