Estimation Errors as Regret Lower Bounds for Linear Contextual Bandits

Published Online:https://doi.org/10.1287/mnsc.2023.02827

References

  • Acharya J, Sun Z, Zhang H (2021) Differentially private Assouad, Fano, and Le Cam. Proc. 32nd Internat. Conf. Algorithmic Learn. Theory (PMLR, New York), 48–78.Google Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Machine Learn. (PMLR, New York), 127–135.Google Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
  • Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.LinkGoogle Scholar
  • Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.LinkGoogle Scholar
  • Bryc W (1995) Rotation invariant distributions. The Normal Distribution: Characterizations with Applications, Lecture Notes in Statistics, vol. 100 (Springer, New York), 51–69.CrossrefGoogle Scholar
  • Cai TT, Wang Y, Zhang L (2020) The cost of privacy in generalized linear models: Algorithms and minimax lower bounds. Preprint, submitted December 6, https://arxiv.org/abs/2011.03900.Google Scholar
  • Cai TT, Wang Y, Zhang L (2021) The cost of privacy: Optimal rates of convergence for parameter estimation with differential privacy. Ann. Statist. 49(5):2825–2850.CrossrefGoogle Scholar
  • Chen X, Simchi-Levi D, Wang Y (2022) Privacy-preserving dynamic personalized pricing with demand learning. Management Sci. 68(7):4878–4898.LinkGoogle Scholar
  • Chen Y, Wang Y, Fang EX, Wang Z, Li R (2024) Nearly dimension-independent sparse linear bandit over small action spaces via best subset selection. J. Amer. Statist. Assoc. 119(545):246–258.CrossrefGoogle Scholar
  • Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 208–214.Google Scholar
  • Duchi J (2016) Lecture notes for Statistics 311/Electrical Engineering 377. Accessed April 18, 2026, https://stanford.edu/class/stats311/OldSyllabi/full_notes.pdf.Google Scholar
  • Duchi JC, Wainwright MJ (2013) Distance-based and continuum Fano inequalities with applications to statistical estimation. Preprint, submitted December 31, https://arxiv.org/abs/1311.2669.Google Scholar
  • Duchi JC, Jordan MI, Wainwright MJ (2018) Minimax optimal procedures for locally private estimation. J. Amer. Statist. Assoc. 113(521):182–201.CrossrefGoogle Scholar
  • Goldenshluger A, Zeevi A (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.LinkGoogle Scholar
  • Han Y, Liang Z, Wang Y, Zhang J (2021) Generalized linear bandits with local differential privacy. Adv. Neural Inform. Processing Systems 34:26511–26522.Google Scholar
  • Han Y, Zhou Z, Zhou Z, Blanchet J, Glynn PW, Ye Y (2020) Sequential batch learning in finite-action linear contextual bandits. Preprint, submitted April 14, https://arxiv.org/abs/2004.06321.Google Scholar
  • He J, Zhang J, Zhang RQ (2022) A reduction from linear contextual bandits lower bounds to estimations lower bounds. Proc. 39th Internat. Conf. Machine Learn. (PMLR, New York), 8660–8677.Google Scholar
  • Kannan S, Morgenstern JH, Roth A, Waggoner B, Wu ZS (2018) A smoothed analysis of the greedy algorithm for the linear contextual bandit problem. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 2231–2241. Google Scholar
  • Kim W, Lee K, Paik MC (2023) Double doubly robust Thompson sampling for generalized linear contextual bandits. Proc. 37th AAAI Conf. Artificial Intelligence and 35th Conf. Innovative Appl. Artificial Intelligence and 13th Sympos. Educational Adv. Artificial Intelligence (AAAI Press, Palo Alto, CA), 8300–8307. Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Li W, Barik A, Honorio J (2022) A simple unified framework for high dimensional bandit problems. Proc. 39th Internat. Conf. Machine Learn. (PMLR, New York), 12619–12655.Google Scholar
  • Li Y, Wang Y, Zhou Y (2019) Nearly minimax-optimal regret for linearly parameterized bandits. Proc. 32nd Conf. Learn. Theory (PMLR, New York), 2173–2174.Google Scholar
  • Li L, Chu W, Langford J, Schapire RE (2010) A contextual-bandit approach to personalized news article recommendation. WWW’10 Proc. 19th Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 661–670.Google Scholar
  • Oh MH, Iyengar G (2021) Multinomial logit contextual bandits: Provable optimality and practicality. Proc. AAAI Conf. Artificial Intelligence 35(10):9205–9213.Google Scholar
  • Oh MH, Iyengar G, Zeevi A (2021) Sparsity-agnostic lasso bandit. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 8271–8280.Google Scholar
  • Ren Z, Zhou Z (2024) Dynamic batch learning in high-dimensional sparse linear contextual bandits. Management Sci. 70(2):1315–1342.LinkGoogle Scholar
  • Shariff R, Sheffet O (2018) Differentially private contextual linear bandits. Adv. Neural Inform. Processing Systems 31:4296–4306. Google Scholar
  • Wang D, Xu J (2019) On sparse linear regression in the local differential privacy model. Proc. 36th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 97 (PMLR, New York), 6628–6637.Google Scholar
  • Zheng K, Cai T, Huang W, Li Z, Wang L (2020) Locally differentially private (contextual) bandits learning. Adv. Neural Inform. Processing Systems 33:12300–12310.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.