Contextual Learning with Online Convex Optimization: Theory and Application to Medical Decision-Making

Published Online:https://doi.org/10.1287/mnsc.2019.03211

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Proc. 24th Internat. Conf. Neural Inform. Processing Systems, NIPS’11 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
  • Abeille M, Lazaric A (2017) Linear thompson sampling revisited. Electron. J. Statist. 11(2):5165–5197.CrossrefGoogle Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Internat. Conf. Machine Learn., vol. 28 III, ICML’13 (JMLR.org, Atlanta, GA), 1220–1228.Google Scholar
  • Ahuja V, Birge JR (2016) Response-adaptive designs for clinical trials: Simultaneous learning from multiple patients. Eur. J. Oper. Res. 248(2):619–633.CrossrefGoogle Scholar
  • American Diabetes Association (2019) cardiovascular disease and risk management: Standards of medical care in diabetes—2019. Diabetes Care. 42(Supplement 1):S103–S123.CrossrefGoogle Scholar
  • Anderer A, Bastani H, Silberholz J (2020) Adaptive clinical trial designs with surrogates: When should we bother? Working paper, University of Michigan, Ann Arbor, MI.Google Scholar
  • Angus DC (2020) Optimizing the trade-off between learning and doing in a pandemic. JAMA 323(19):1895–1896.CrossrefGoogle Scholar
  • Arauz-Pacheco C, Parrott MA, Raskin P (2004) Hypertension management in adults with diabetes. Diabetes Care 27:S65–S67.CrossrefGoogle Scholar
  • Arnold SE, Betensky RA (2018) Multi-crossover randomized controlled trial designs in Alzheimer’s disease. Ann. Neurol. 84(2):168–175.CrossrefGoogle Scholar
  • Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(11):397–422.Google Scholar
  • Ayer T, Alagoz O, Stout NK, Burnside ES (2016) Heterogeneity in women’s adherence and its role in optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.LinkGoogle Scholar
  • Bai Y, Xie T, Jiang N, Wang YX (2019) Provably efficient q-learning with low switching cost. Adv. Neural Inform. Processing Systems 32:8004–8013.Google Scholar
  • Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.LinkGoogle Scholar
  • Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.LinkGoogle Scholar
  • Basu S, Sussman JB, Berkowitz SA, Hayward RA, Yudkin JS (2017) Development and validation of risk equations for complications of type 2 diabetes (recode) using individual participant data from randomised trials. Lancet Diabetes Endocrinol. 5(10):788–798.CrossrefGoogle Scholar
  • Baucum M, Khojandi A, Vasudevan R, Ramdhani R (2023) Optimizing patient-specific medication regimen policies using wearable sensors in parkinson’s disease. Management Sci. 69(10):5964–5982.Google Scholar
  • Bellamy L, Casas JP, Hingorani AD, Williams D (2009) Type 2 diabetes mellitus after gestational diabetes: A systematic review and meta-analysis. Lancet 373(9677):1773–1779.CrossrefGoogle Scholar
  • Bertsimas D, Borenstein ARA, Dauvin A, Orfanoudaki A (2021) Ensemble machine learning for personalized antihypertensive treatment. Naval Res. Logist. (NRL) 69(5):669–688.Google Scholar
  • Bertsimas D, O’Hair A, Relyea S, Silberholz J (2016) An analytics approach to designing combination chemotherapy regimens for cancer. Management Sci. 62(5):1511–1531.LinkGoogle Scholar
  • Bertsimas D, Zhuo YD (2020) Novel target discovery of existing therapies: Path to personalized cancer therapy. Informs J. Optim. 2(1):1–13.LinkGoogle Scholar
  • Bouneffouf D, Rish I (2019) A survey on practical applications of multi-armed and contextual bandits. Preprint, submitted April 2, https://arxiv.org/abs/1904.10040.Google Scholar
  • Bubeck S, Munos R, Stoltz G, Szepesvári C (2011) X-armed bandits. J. Machine Learn. Res. 12(5):1655–1695.Google Scholar
  • Cao J, Gao R, Keyvanshokooh E (2025a) Hr-bandit: Human-ai collaborated linear recourse bandit. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York).Google Scholar
  • Cao J, Keyvanshokooh E, Liu T (2025b) Safe reinforcement learning with contextual information: Theory and applications. Preprint, submitted September 25, http://dx.doi.org/10.2139/ssrn.4583667.Google Scholar
  • Capan M, Khojandi A, Denton BT, Williams KD, Ayer T, Chhatwal J, Kurt M, et al. (2017) From data to improved decisions: Operations research in healthcare delivery. Med. Decis. Making 37(8):849–859.CrossrefGoogle Scholar
  • Catala-Lopez F, Saint-Gerons DM, Gonzalez-Bermejo D, Rosano GM, Davis BR, Ridao M, Zaragoza A, et al. (2016) Cardiovascular and renal outcomes of renin–angiotensin system blockade in adult patients with diabetes mellitus: A systematic review with network meta-analyses. PLoS Med. 13(3):1–30.CrossrefGoogle Scholar
  • Centers for Disease Control and Prevention (2020) Blood pressure medicines. Accessed June 6, 2022, https://www.cdc.gov/bloodpressure/medicines.htm.Google Scholar
  • Chan T, Narasimhan C, Xie Y (2013) Treatment effectiveness and side effects: A model of physician learning. Management Sci. 59(6):1309–1325.LinkGoogle Scholar
  • Chehrazi N, Cipriano LE, Enns EA (2019) Dynamics of drug resistance: Optimal control of an infectious disease. Oper. Res. 67(3):619–650.LinkGoogle Scholar
  • Chen W, Lu Y, Qiu L, Kumar S (2021) Designing personalized treatment plans for breast cancer. Inform. Systems Res. 32(3):932–949.LinkGoogle Scholar
  • Cheung WC, Simchi-Levi D, Zhu R (2021) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Google Scholar
  • Chick SE, Gans N, Yapar Ö (2021) Bayesian sequential learning for clinical trials of multiple correlated medical interventions. Management Sci. 68(7):4919–4938.Google Scholar
  • Chow SC (2014) Adaptive clinical trial design. Annu. Rev. Med. 65:405–415.CrossrefGoogle Scholar
  • Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist. AISTATS’11 (JLMR, Ft. Lauderdale, FL), 208–214.Google Scholar
  • Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Working paper, University of Chicago, Chicago, IL.Google Scholar
  • Denton BT (2018) Optimization of sequential decision making for chronic diseases: From data to decisions. Recent Adv. Optim. Modeling Contemporary Problems (INFORMS, Cantonsville, MD), 316–348.LinkGoogle Scholar
  • Denton BT, Alagoz O, Holder A, Lee EK (2011) Medical decision making: Open research challenges. IIE Trans. Healthcare Systems Engrg. 1(3):161–167.CrossrefGoogle Scholar
  • Epstein CCL (2014) An analytics approach to hypertension treatment. PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
  • Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.LinkGoogle Scholar
  • Filippi S, Cappé O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems, vol. 1, NIPS’10 (Curran Associates Inc., Red Hook, NY), 586–594.Google Scholar
  • Gan K, Keyvanshokooh E, Liu X, Murphy S (2024) Contextual bandits with budgeted information reveal. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3970–3978.Google Scholar
  • Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’agostino RB, Gibbons R, Greenland P, et al. (2014) 2013 ACC/AHA Guideline on the assessment of cardiovascular risk: A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J. Amer. College Cardiol. 63(25 Part B):2935–2959.CrossrefGoogle Scholar
  • Goldenshluger A, Zeevi A (2013) A linear response bandit problem. Stoch. Syst. 3(1):230–261.LinkGoogle Scholar
  • Group AS (2010) Effects of intensive blood-pressure control in type 2 diabetes mellitus. New Engl. J. Med. 362(17):1575–1585.CrossrefGoogle Scholar
  • Guan W, Liang W, Zhao Y, Liang H, Chen Z, Li Y, Liu X, et al. (2020) Comorbidity and its impact on 1590 patients with COVID-19 in China: A nationwide analysis. Eur. Respir. J. 55(5):2000547–2000577.CrossrefGoogle Scholar
  • Hamidi N, Bayati M (2020) A general framework to analyze stochastic linear bandit. Working paper, Stanford University, Stanford, CA.Google Scholar
  • Helm JE, Lavieri MS, Van Oyen MP, Stein JD, Musch DC (2015) Dynamic forecasting and control algorithms of glaucoma progression for clinician decision support. Oper. Res. 63(5):979–999.LinkGoogle Scholar
  • Hopp WJ, Li J, Wang G (2018) Big data and the precision medicine revolution. Production Oper. Management 27(9):1647–1664.CrossrefGoogle Scholar
  • Iasonos A, O’Quigley J (2014) Adaptive dose-finding studies: A review of model-guided phase I clinical trials. J. Clin. Oncol. 32(23):2505–2511.CrossrefGoogle Scholar
  • Ibrahim R, Kucukyazici B, Verter V, Gendreau M, Blostein M (2016) Designing personalized treatment: An application to anticoagulation therapy. Production Oper. Management 25(5):902–918.CrossrefGoogle Scholar
  • James PA, Oparil S, Carter BL, Cushman WC, Dennison-Himmelfarb C, Handler J, Lackland DT, et al. (2014) 2014 evidence-based guideline for the management of high blood pressure in adults: Report from the panel members appointed to the eighth joint national committee (jnc 8). JAMA 311(5):507–520.CrossrefGoogle Scholar
  • Keyvanshokooh E (2021) Personalized data-driven learning and optimization: theory and applications to healthcare. PhD thesis, University of Michigan at Ann Arbor, Ann Arbor.Google Scholar
  • Kjeldsen SE (2018) Hypertension and cardiovascular risk: General aspects. Pharmacol. Res. 129:95–99.CrossrefGoogle Scholar
  • Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. Fortieth Annual ACM Sympos. Theory Comput., STOC’08 (Association for Computing Machinery, New York), 681–690.Google Scholar
  • Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Law M, Morris J, Wald N (2009) Use of blood pressure lowering drugs in the prevention of cardiovascular disease: Meta-analysis of 147 randomised trials in the context of expectations from prospective epidemiological studies. BMJ 338:1–19.Google Scholar
  • Lee CP, Chertow GM, Zenios SA (2008) Optimal initiation and management of dialysis therapy. Oper. Res. 56(6):1428–1449.LinkGoogle Scholar
  • Lee EK, Wei X, Baker-Witt F, Wright MD, Quarshie A (2018) Outcome-driven personalized treatment design for managing diabetes. Interfaces 48(5):422–435.LinkGoogle Scholar
  • Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. Proc. 34th Internat. Conf. Machine Learn., vol. 70, ICML’17 (JMLR.org, Sydney, Australia), 2071–2080.Google Scholar
  • Liu X, Gan K, Keyvanshokooh E, Murphy S (2025) Online uniform sampling: Randomized learning-augmented approximation algorithms with application to digital health. Preprint, submitted February 3, https://arxiv.org/abs/2402.01995.Google Scholar
  • Mason JE, Denton BT, Shah ND, Smith SA (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.CrossrefGoogle Scholar
  • Mintz Y, Aswani A, Kaminsky P, Flowers E, Fukuoka Y (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res. 68(5):1493–1516.LinkGoogle Scholar
  • Moore BL, Pyeatt LD, Kulkarni V, Panousis P, Padrez K, Doufas AG (2014) Reinforcement learning for closed-loop propofol anesthesia: A study in human volunteers. J. Machine Learn. Res. 15(1):655–696.Google Scholar
  • Negoescu DM, Bimpikis K, Brandeau ML, Iancu DA (2018) Dynamic learning of patient response types: An application to treating chronic diseases. Management Sci. 64(8):3469–3488.LinkGoogle Scholar
  • Padmanabhan R, Meskin N, Haddad WM (2017) Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment. Math. Biosci. 293:11–20.CrossrefGoogle Scholar
  • Padmanabhan R, Meskin N, Haddad WM (2019) Optimal adaptive control of drug dosing using integral reinforcement learning. Math. Biosci. 309:131–142.CrossrefGoogle Scholar
  • Palmer SC, Mavridis D, Navarese E, Craig JC, Tonelli M, Salanti G, Wiebe N, Ruospo M, Wheeler DC, Strippoli GF (2015) Comparative efficacy and safety of blood pressure-lowering agents in adults with diabetes and kidney disease: A network meta-analysis. Lancet 385(9982):2047–2056.CrossrefGoogle Scholar
  • Qiao D, Yin M, Min M, Wang YX (2022) Sample-efficient reinforcement learning with loglog (t) switching cost. Internat. Conf. Machine Learn. (PMLR), 18031–18061.Google Scholar
  • Richman IB, Fairley M, Jørgensen ME, Schuler A, Owens DK, Goldhaber-Fiebert JD (2016) Cost-effectiveness of intensive blood pressure management. JAMA Cardiol. 1(8):872–879.CrossrefGoogle Scholar
  • Rubino D, Abrahamsson N, Davies M, Hesse D, Greenway FL, Jensen C, Lingvay I, et al. (2021) Effect of continued weekly subcutaneous semaglutide vs placebo on weight loss maintenance in adults with overweight or obesity: The step 4 randomized clinical trial. JAMA 325(14):1414–1425.CrossrefGoogle Scholar
  • Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.LinkGoogle Scholar
  • Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.LinkGoogle Scholar
  • Skandari MR, Shechter SM (2021) Patient-type bayes-adaptive treatment plans. Oper. Res. 69(2):574–598.LinkGoogle Scholar
  • Tunc S, Alagoz O, Burnside E (2014) Opportunities for operations research in medical decision making. IEEE Intell. Syst. 29(3):59–62.Google Scholar
  • Virani SS, Alonso A, Aparicio HJ, Benjamin EJ, Bittencourt MS, Callaway CW, Carson AP, et al. (2021) Heart disease and stroke statistics—2021 update: A report from the American Heart Association. Circulation 143(8):e254–e743.CrossrefGoogle Scholar
  • Whelton PK, Carey RM, Aronow WS, Casey DE, Collins KJ, Himmelfarb CD, DePalma SM, et al. (2018) 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APHA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J. Amer. College Cardiol. 71(19):e127–e248.CrossrefGoogle Scholar
  • WHO (2020) Hearts: Technical package for cardiovascular disease management in primary health care.Google Scholar
  • Yang Y, Goldhaber-Fiebert JD, Wein LM (2013) Analyzing screening policies for childhood obesity. Management Sci. 59(4):782–795.LinkGoogle Scholar
  • Yin G (2012) Clinical Trial Design: Bayesian and Frequentist Adaptive Methods, vol. 876 (John Wiley & Sons, Hoboken, NJ).Google Scholar
  • Yuan H, Luo Q, Shi C (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.LinkGoogle Scholar
  • Zhou J, Liu J, Narayan VA, Ye J (2012) Modeling disease progression via fused sparse group lasso. Proc. 18th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining, KDD ‘12 (Association for Computing Machinery, New York), 1095–1103.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.