Contextual Learning with Online Convex Optimization: Theory and Application to Medical Decision-Making
References
- (2011) Improved algorithms for linear stochastic bandits. Proc. 24th Internat. Conf. Neural Inform. Processing Systems, NIPS’11 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
- (2017) Linear thompson sampling revisited. Electron. J. Statist. 11(2):5165–5197.Crossref, Google Scholar
- (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Internat. Conf. Machine Learn., vol. 28 III, ICML’13 (JMLR.org, Atlanta, GA), 1220–1228.Google Scholar
- (2016) Response-adaptive designs for clinical trials: Simultaneous learning from multiple patients. Eur. J. Oper. Res. 248(2):619–633.Crossref, Google Scholar
- American Diabetes Association (2019) cardiovascular disease and risk management: Standards of medical care in diabetes—2019. Diabetes Care. 42(Supplement 1):S103–S123.Crossref, Google Scholar
- (2020) Adaptive clinical trial designs with surrogates: When should we bother? Working paper, University of Michigan, Ann Arbor, MI.Google Scholar
- (2020) Optimizing the trade-off between learning and doing in a pandemic. JAMA 323(19):1895–1896.Crossref, Google Scholar
- (2004) Hypertension management in adults with diabetes. Diabetes Care 27:S65–S67.Crossref, Google Scholar
- (2018) Multi-crossover randomized controlled trial designs in Alzheimer’s disease. Ann. Neurol. 84(2):168–175.Crossref, Google Scholar
- (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(11):397–422.Google Scholar
- (2016) Heterogeneity in women’s adherence and its role in optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.Link, Google Scholar
- (2019) Provably efficient q-learning with low switching cost. Adv. Neural Inform. Processing Systems 32:8004–8013.Google Scholar
- (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.Link, Google Scholar
- (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar
- (2017) Development and validation of risk equations for complications of type 2 diabetes (recode) using individual participant data from randomised trials. Lancet Diabetes Endocrinol. 5(10):788–798.Crossref, Google Scholar
- (2023) Optimizing patient-specific medication regimen policies using wearable sensors in parkinson’s disease. Management Sci. 69(10):5964–5982.Google Scholar
- (2009) Type 2 diabetes mellitus after gestational diabetes: A systematic review and meta-analysis. Lancet 373(9677):1773–1779.Crossref, Google Scholar
- (2021) Ensemble machine learning for personalized antihypertensive treatment. Naval Res. Logist. (NRL) 69(5):669–688.Google Scholar
- (2016) An analytics approach to designing combination chemotherapy regimens for cancer. Management Sci. 62(5):1511–1531.Link, Google Scholar
- (2020) Novel target discovery of existing therapies: Path to personalized cancer therapy. Informs J. Optim. 2(1):1–13.Link, Google Scholar
- (2019) A survey on practical applications of multi-armed and contextual bandits. Preprint, submitted April 2, https://arxiv.org/abs/1904.10040.Google Scholar
- (2011) X-armed bandits. J. Machine Learn. Res. 12(5):1655–1695.Google Scholar
- (2025a) Hr-bandit: Human-ai collaborated linear recourse bandit. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York).Google Scholar
- (2025b) Safe reinforcement learning with contextual information: Theory and applications. Preprint, submitted September 25, http://dx.doi.org/10.2139/ssrn.4583667.Google Scholar
- (2017) From data to improved decisions: Operations research in healthcare delivery. Med. Decis. Making 37(8):849–859.Crossref, Google Scholar
- (2016) Cardiovascular and renal outcomes of renin–angiotensin system blockade in adult patients with diabetes mellitus: A systematic review with network meta-analyses. PLoS Med. 13(3):1–30.Crossref, Google Scholar
- Centers for Disease Control and Prevention (2020) Blood pressure medicines. Accessed June 6, 2022, https://www.cdc.gov/bloodpressure/medicines.htm.Google Scholar
- (2013) Treatment effectiveness and side effects: A model of physician learning. Management Sci. 59(6):1309–1325.Link, Google Scholar
- (2019) Dynamics of drug resistance: Optimal control of an infectious disease. Oper. Res. 67(3):619–650.Link, Google Scholar
- (2021) Designing personalized treatment plans for breast cancer. Inform. Systems Res. 32(3):932–949.Link, Google Scholar
- (2021) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Google Scholar
- (2021) Bayesian sequential learning for clinical trials of multiple correlated medical interventions. Management Sci. 68(7):4919–4938.Google Scholar
- (2014) Adaptive clinical trial design. Annu. Rev. Med. 65:405–415.Crossref, Google Scholar
- (2011) Contextual bandits with linear payoff functions. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist. AISTATS’11 (JLMR, Ft. Lauderdale, FL), 208–214.Google Scholar
- (2008) Stochastic linear optimization under bandit feedback. Working paper, University of Chicago, Chicago, IL.Google Scholar
- (2018) Optimization of sequential decision making for chronic diseases: From data to decisions. Recent Adv. Optim. Modeling Contemporary Problems (INFORMS, Cantonsville, MD), 316–348.Link, Google Scholar
- (2011) Medical decision making: Open research challenges. IIE Trans. Healthcare Systems Engrg. 1(3):161–167.Crossref, Google Scholar
- (2014) An analytics approach to hypertension treatment. PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
- (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
- (2010) Parametric bandits: The generalized linear case. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems, vol. 1, NIPS’10 (Curran Associates Inc., Red Hook, NY), 586–594.Google Scholar
- (2024) Contextual bandits with budgeted information reveal. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3970–3978.Google Scholar
- (2014) 2013 ACC/AHA Guideline on the assessment of cardiovascular risk: A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J. Amer. College Cardiol. 63(25 Part B):2935–2959.Crossref, Google Scholar
- (2013) A linear response bandit problem. Stoch. Syst. 3(1):230–261.Link, Google Scholar
- (2010) Effects of intensive blood-pressure control in type 2 diabetes mellitus. New Engl. J. Med. 362(17):1575–1585.Crossref, Google Scholar
- (2020) Comorbidity and its impact on 1590 patients with COVID-19 in China: A nationwide analysis. Eur. Respir. J. 55(5):2000547–2000577.Crossref, Google Scholar
- (2020) A general framework to analyze stochastic linear bandit. Working paper, Stanford University, Stanford, CA.Google Scholar
- (2015) Dynamic forecasting and control algorithms of glaucoma progression for clinician decision support. Oper. Res. 63(5):979–999.Link, Google Scholar
- (2018) Big data and the precision medicine revolution. Production Oper. Management 27(9):1647–1664.Crossref, Google Scholar
- (2014) Adaptive dose-finding studies: A review of model-guided phase I clinical trials. J. Clin. Oncol. 32(23):2505–2511.Crossref, Google Scholar
- (2016) Designing personalized treatment: An application to anticoagulation therapy. Production Oper. Management 25(5):902–918.Crossref, Google Scholar
- (2014) 2014 evidence-based guideline for the management of high blood pressure in adults: Report from the panel members appointed to the eighth joint national committee (jnc 8). JAMA 311(5):507–520.Crossref, Google Scholar
- (2021) Personalized data-driven learning and optimization: theory and applications to healthcare. PhD thesis, University of Michigan at Ann Arbor, Ann Arbor.Google Scholar
- (2018) Hypertension and cardiovascular risk: General aspects. Pharmacol. Res. 129:95–99.Crossref, Google Scholar
- (2008) Multi-armed bandits in metric spaces. Proc. Fortieth Annual ACM Sympos. Theory Comput., STOC’08 (Association for Computing Machinery, New York), 681–690.Google Scholar
- (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2009) Use of blood pressure lowering drugs in the prevention of cardiovascular disease: Meta-analysis of 147 randomised trials in the context of expectations from prospective epidemiological studies. BMJ 338:1–19.Google Scholar
- (2008) Optimal initiation and management of dialysis therapy. Oper. Res. 56(6):1428–1449.Link, Google Scholar
- (2018) Outcome-driven personalized treatment design for managing diabetes. Interfaces 48(5):422–435.Link, Google Scholar
- (2017) Provably optimal algorithms for generalized linear contextual bandits. Proc. 34th Internat. Conf. Machine Learn., vol. 70, ICML’17 (JMLR.org, Sydney, Australia), 2071–2080.Google Scholar
- (2025) Online uniform sampling: Randomized learning-augmented approximation algorithms with application to digital health. Preprint, submitted February 3, https://arxiv.org/abs/2402.01995.Google Scholar
- (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.Crossref, Google Scholar
- (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res. 68(5):1493–1516.Link, Google Scholar
- (2014) Reinforcement learning for closed-loop propofol anesthesia: A study in human volunteers. J. Machine Learn. Res. 15(1):655–696.Google Scholar
- (2018) Dynamic learning of patient response types: An application to treating chronic diseases. Management Sci. 64(8):3469–3488.Link, Google Scholar
- (2017) Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment. Math. Biosci. 293:11–20.Crossref, Google Scholar
- (2019) Optimal adaptive control of drug dosing using integral reinforcement learning. Math. Biosci. 309:131–142.Crossref, Google Scholar
- (2015) Comparative efficacy and safety of blood pressure-lowering agents in adults with diabetes and kidney disease: A network meta-analysis. Lancet 385(9982):2047–2056.Crossref, Google Scholar
- (2022) Sample-efficient reinforcement learning with loglog (t) switching cost. Internat. Conf. Machine Learn. (PMLR), 18031–18061.Google Scholar
- (2016) Cost-effectiveness of intensive blood pressure management. JAMA Cardiol. 1(8):872–879.Crossref, Google Scholar
- (2021) Effect of continued weekly subcutaneous semaglutide vs placebo on weight loss maintenance in adults with overweight or obesity: The step 4 randomized clinical trial. JAMA 325(14):1414–1425.Crossref, Google Scholar
- (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
- (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
- (2021) Patient-type bayes-adaptive treatment plans. Oper. Res. 69(2):574–598.Link, Google Scholar
- (2014) Opportunities for operations research in medical decision making. IEEE Intell. Syst. 29(3):59–62.Google Scholar
- (2021) Heart disease and stroke statistics—2021 update: A report from the American Heart Association. Circulation 143(8):e254–e743.Crossref, Google Scholar
- (2018) 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APHA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J. Amer. College Cardiol. 71(19):e127–e248.Crossref, Google Scholar
- WHO (2020) Hearts: Technical package for cardiovascular disease management in primary health care.Google Scholar
- (2013) Analyzing screening policies for childhood obesity. Management Sci. 59(4):782–795.Link, Google Scholar
- (2012) Clinical Trial Design: Bayesian and Frequentist Adaptive Methods, vol. 876 (John Wiley & Sons, Hoboken, NJ).Google Scholar
- (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.Link, Google Scholar
- (2012) Modeling disease progression via fused sparse group lasso. Proc. 18th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining, KDD ‘12 (Association for Computing Machinery, New York), 1095–1103.Google Scholar

