Online Decision Making with High-Dimensional Covariates

Hamsa Bastani
Corresponding Author
Hamsa Bastani
https://orcid.org/0000-0002-8793-4732
Wharton School, Operations Information and Decisions, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
Search for more papers by this author
,
Mohsen Bayati
Mohsen Bayati
https://orcid.org/0000-0002-7280-912X
Stanford Graduate School of Business, Stanford University, Stanford, California 94305
Search for more papers by this author

Hamsa Bastani

Corresponding Author

Hamsa Bastani

https://orcid.org/0000-0002-8793-4732

Wharton School, Operations Information and Decisions, University of Pennsylvania, Philadelphia, Pennsylvania 19104;

Search for more papers by this author

Mohsen Bayati

https://orcid.org/0000-0002-7280-912X

Stanford Graduate School of Business, Stanford University, Stanford, California 94305

Search for more papers by this author

Published Online:7 Nov 2019https://doi.org/10.1287/opre.2019.1902

References

Abbasi-Yadkori Y (2012) Online learning for linearly parametrized control problems. PhD thesis, University of Alberta, Edmonton, AB, Canada.Google Scholar
Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 2312–2320.Google Scholar
Abbasi-Yadkori Y, Pal D, Szepesvari C (2012) Online-to-confidence-set conversions and application to sparse stochastic bandits. Proc. Machine Learn. Res. 22:1–9.Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. Machine Learn. Res. 28:127–135.Google Scholar
Athey S, Imbens GW, Wager S (2016) Approximate residual balancing: de-biased inference of average treatment effects in high dimensions. Working paper, Stanford University, Stanford, CA.Google Scholar
Auer P (2003) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3:397–422.Google Scholar
Ban G-Y, Rudin C (2019) The big data newsvendor: practical insights from machine learning. Oper. Res. 67(1):90–108.Google Scholar
Bayati M, Braverman M, Gillam M, Mack K, Ruiz G, Smith M, Horvitz E (2014) Data-driven decisions for reducing readmissions for heart failure: general methodology and case study. PLoS One 9(10):e109264.Crossref, Google Scholar
Belloni A, Chernozhukov V, Hansen C (2014) Inference on treatment effects after selection among high-dimensional controls. Rev. Econom. Stud. 81(2):608–650.Crossref, Google Scholar
Bertsimas D, Kallus N (2014) From predictive to prescriptive analytics. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
Bickel P, Ya’acov R, Tsybakov A (2009) Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37(4):1705–1732.Crossref, Google Scholar
Bubeck S, Cesa-Bianchi N (2012) Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations Trends Machine Learn. 5(1):1–122.Google Scholar
Budnitz DS, Pollock DA, Weidenbach KN, Mendelson AB, Schroeder TJ, Annest JL (2006) National surveillance of emergency department visits for outpatient adverse drug events. JAMA 296:1858–1866.Crossref, Google Scholar
Bühlmann P, Van De Geer S (2011) Statistics for High-Dimensional Data: Methods, Theory and Applications (Springer Science & Business Media, New York).Crossref, Google Scholar
Candes E, Tao T (2007) The dantzig selector: statistical estimation when p is much larger than n. Ann. Statist. 35(6):2313–2351.Crossref, Google Scholar
Carpentier A, Munos R (2012) Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. 15th International Conference on Artificial Intelligence and Statistics (AISTATS), La Palma, Canary Islands, 190–198.Google Scholar
Chen, SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20(1):33–61.Crossref, Google Scholar
Chen X, Owen Z, Pixton C, Simchi-Levi D (2015) A statistical learning approach to personalization in revenue management. Working paper, New York University, New York.Google Scholar
Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. Machine Learn. Res. 15:208–214.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under Bandit feedback. Servedio RA, Zhang T, eds. Proc. Conf. Learn. Theory (Omnipress, Madison, WI), 355–366.Google Scholar
Deshpande Y, Montanari A (2012) Linear bandits in high dimension and recommendation systems. Preprint, submitted January 8, https://arxiv.org/abs/1301.1722.Google Scholar
Efron B, Tibshirani RJ (1993) An Introduction to the Bootstrap (Chapman Hall, New York).Crossref, Google Scholar
Elmachtoub AN, McNellis R, Oh S, Petrik M (2017) A practical method for solving contextual bandit problems using decision trees. Proc. 33rd Conf. Uncertainty Artificial Intelligence (UAI), Sydney, Australia, 11–15.Google Scholar
Goldenshluger A, Zeevi A (2013) A linear response bandit problem. Stochastic Systems 3(1):230–261.Link, Google Scholar
Hastie T, Tibshirani R, Friedman J (2001) The Elements of Statistical Learning (Springer, New York).Crossref, Google Scholar
He B, Dexter F, Macario A, Zenios S (2012) The timing of staffing decisions in hospital operating rooms: incorporating workload heterogeneity into the newsvendor problem. Manufacturing Service Oper. Management 14(1):99–114.Link, Google Scholar
International Warfarin Pharmacogenetics Consortium, Klein TE, Altman RB, Eriksson N, Gage BF, Kimmel SE, Lee MT, et al.. (2009) Estimation of the warfarin dose with clinical and pharmacogenetic data. New England J. Medicine 360(8):753–764.Google Scholar
Kallus N, Udell M (2016) Dynamic assortment personalization in high dimensions. Preprint, arXiv:1610.05604.Google Scholar
Kim ES, Herbst RS, Wistuba II, Lee JJ, Blumenschein GR, Tsao A, Stewart DJ, et al.. (2011) The battle trial: personalizing therapy for lung cancer. Cancer Discovery 1(1):44–53.Crossref, Google Scholar
Kivinen J, Warmuth MK (1997) Exponentiated gradient vs. gradient descent for linear predictors. Inform. Comput. 132(1):1–63.Crossref, Google Scholar
Langford J, Zhang T (2008) The epoch-greedy algorithm for multi-armed bandits with side information. Platt JC, Koller D, Singer Y, Roweis ST, eds. Advances in Neural Information Processing Systems, vol. 20 (Curran Associates, Red Hook, NY), 817–824.Google Scholar
Naik P, Wedel M, Bacon L, Bodapati A, Bradlow E, Kamakura W, Kreulen J, Lenk P, Madigan DM, Montgomery A (2008) Challenges and opportunities in high-dimensional choice data analyses. Marketing Lett. 19(3–4):201–213.Crossref, Google Scholar
Negahban SN, Ravikumar P, Wainwright MJ, Yu B (2012) A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Statist. Sci. 27(4):538–557.Google Scholar
Perchet V, Rigollet P (2013) The multi-armed bandit problem with covariates. Ann. Statist. 41(2):693–721.Crossref, Google Scholar
Razavian N, Blecker S, Schmidt AM, Smith-McLallen A, Nigam S, Sontag D (2015) Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3(4):277–287.Crossref, Google Scholar
Rigollet P, Zeevi A (2010) Nonparametric bandits with covariates. Kalai AT, Mohri M, eds. Proc. Conf. Learn. Theory (Omnipress, Madison, WI), 54–66.Google Scholar
Rusmevichientong, P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Russo D, Van Roy B (2014a) Learning to optimize via information-directed sampling, Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 27 (Curran Associates, Red Hook, NY), 1583–1591.Google Scholar
Russo D, Van Roy B (2014b) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
Slivkins A (2014) Contextual bandits with similarity information. J. Mach. Learn. Res. 15(1):2533–2568.Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Series B (Methodological) 58(1):267–288.Crossref, Google Scholar
Tropp J (2015) An introduction to matrix concentration inequalities. Foundations Trends Machine Learn. 8(1–2):1– 230.Crossref, Google Scholar
Tsybakov AB (2004) Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32(1):135–166.Google Scholar
Wysowski DK, Nourjah P, Swartz L (2007) Bleeding complications with warfarin use: a prevalent adverse effect resulting in regulatory action. Internal Medicine 167(13):1414–1419.Google Scholar
Yan L, Li W-J, Xue G-R, Han D (2014) Coupled group lasso for web-scale CTR prediction in display advertising. Proc. Machine Learn. Res. 32(2):802–810.Google Scholar

Volume 68, Issue 1

January-February 2020

Pages 1-307, C2-C3

Article Information

Supplemental Material

Metrics

Information

Received:March 31, 2017
Accepted:June 06, 2019
Published Online:November 07, 2019

Cite as

Hamsa Bastani, Mohsen Bayati (2019) Online Decision Making with High-Dimensional Covariates. Operations Research 68(1):276-294.

https://doi.org/10.1287/opre.2019.1902

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Online Decision Making with High-Dimensional Covariates

References

Volume 68, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News