Contextual Learning with Online Convex Optimization: Theory and Application to Medical Decision-Making

Esmaeil Keyvanshokooh
Esmaeil Keyvanshokooh
[email protected]
https://orcid.org/0000-0001-9634-3806
Information and Operations Management, Mays Business School, Texas A&M University, College Station, Texas 77845
Search for more papers by this author
,
Mohammad Zhalechian
Mohammad Zhalechian
[email protected]
https://orcid.org/0000-0002-1174-6102
Operations and Decision Technologies, Kelley School of Business, Indiana University, Bloomington, Indiana 47405
Search for more papers by this author
,
Cong Shi
Cong Shi
[email protected]
https://orcid.org/0000-0003-3564-3391
Management, Herbert Business School, University of Miami, Coral Gables, Florida 33146
Search for more papers by this author
,
Mark P. Van Oyen
Corresponding Author
Mark P. Van Oyen
[email protected]
https://orcid.org/0000-0002-8685-7843
Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105
Search for more papers by this author
,
Pooyan Kazemian
Pooyan Kazemian
[email protected]
https://orcid.org/0000-0002-2846-3862
Operations, Weatherhead School of Management, Case Western Reserve University, Cleveland, Ohio 44106
Search for more papers by this author

Information and Operations Management, Mays Business School, Texas A&M University, College Station, Texas 77845

Search for more papers by this author

Mohammad Zhalechian

[email protected]

https://orcid.org/0000-0002-1174-6102

Operations and Decision Technologies, Kelley School of Business, Indiana University, Bloomington, Indiana 47405

Search for more papers by this author

Cong Shi

[email protected]

https://orcid.org/0000-0003-3564-3391

Management, Herbert Business School, University of Miami, Coral Gables, Florida 33146

Search for more papers by this author

Mark P. Van Oyen

Corresponding Author

Mark P. Van Oyen

[email protected]

https://orcid.org/0000-0002-8685-7843

Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48105

Search for more papers by this author

Pooyan Kazemian

[email protected]

https://orcid.org/0000-0002-2846-3862

Operations, Weatherhead School of Management, Case Western Reserve University, Cleveland, Ohio 44106

Search for more papers by this author

Published Online:2 May 2025https://doi.org/10.1287/mnsc.2019.03211

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Proc. 24th Internat. Conf. Neural Inform. Processing Systems, NIPS’11 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
Abeille M, Lazaric A (2017) Linear thompson sampling revisited. Electron. J. Statist. 11(2):5165–5197.Crossref, Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Proc. 30th Internat. Conf. Internat. Conf. Machine Learn., vol. 28 III, ICML’13 (JMLR.org, Atlanta, GA), 1220–1228.Google Scholar
Ahuja V, Birge JR (2016) Response-adaptive designs for clinical trials: Simultaneous learning from multiple patients. Eur. J. Oper. Res. 248(2):619–633.Crossref, Google Scholar
American Diabetes Association (2019) cardiovascular disease and risk management: Standards of medical care in diabetes—2019. Diabetes Care. 42(Supplement 1):S103–S123.Crossref, Google Scholar
Anderer A, Bastani H, Silberholz J (2020) Adaptive clinical trial designs with surrogates: When should we bother? Working paper, University of Michigan, Ann Arbor, MI.Google Scholar
Angus DC (2020) Optimizing the trade-off between learning and doing in a pandemic. JAMA 323(19):1895–1896.Crossref, Google Scholar
Arauz-Pacheco C, Parrott MA, Raskin P (2004) Hypertension management in adults with diabetes. Diabetes Care 27:S65–S67.Crossref, Google Scholar
Arnold SE, Betensky RA (2018) Multi-crossover randomized controlled trial designs in Alzheimer’s disease. Ann. Neurol. 84(2):168–175.Crossref, Google Scholar
Auer P (2002) Using confidence bounds for exploitation-exploration trade-offs. J. Machine Learn. Res. 3(11):397–422.Google Scholar
Ayer T, Alagoz O, Stout NK, Burnside ES (2016) Heterogeneity in women’s adherence and its role in optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.Link, Google Scholar
Bai Y, Xie T, Jiang N, Wang YX (2019) Provably efficient q-learning with low switching cost. Adv. Neural Inform. Processing Systems 32:8004–8013.Google Scholar
Bastani H, Bayati M (2020) Online decision making with high-dimensional covariates. Oper. Res. 68(1):276–294.Link, Google Scholar
Bastani H, Bayati M, Khosravi K (2021) Mostly exploration-free algorithms for contextual bandits. Management Sci. 67(3):1329–1349.Link, Google Scholar
Basu S, Sussman JB, Berkowitz SA, Hayward RA, Yudkin JS (2017) Development and validation of risk equations for complications of type 2 diabetes (recode) using individual participant data from randomised trials. Lancet Diabetes Endocrinol. 5(10):788–798.Crossref, Google Scholar
Baucum M, Khojandi A, Vasudevan R, Ramdhani R (2023) Optimizing patient-specific medication regimen policies using wearable sensors in parkinson’s disease. Management Sci. 69(10):5964–5982.Google Scholar
Bellamy L, Casas JP, Hingorani AD, Williams D (2009) Type 2 diabetes mellitus after gestational diabetes: A systematic review and meta-analysis. Lancet 373(9677):1773–1779.Crossref, Google Scholar
Bertsimas D, Borenstein ARA, Dauvin A, Orfanoudaki A (2021) Ensemble machine learning for personalized antihypertensive treatment. Naval Res. Logist. (NRL) 69(5):669–688.Google Scholar
Bertsimas D, O’Hair A, Relyea S, Silberholz J (2016) An analytics approach to designing combination chemotherapy regimens for cancer. Management Sci. 62(5):1511–1531.Link, Google Scholar
Bertsimas D, Zhuo YD (2020) Novel target discovery of existing therapies: Path to personalized cancer therapy. Informs J. Optim. 2(1):1–13.Link, Google Scholar
Bouneffouf D, Rish I (2019) A survey on practical applications of multi-armed and contextual bandits. Preprint, submitted April 2, https://arxiv.org/abs/1904.10040.Google Scholar
Bubeck S, Munos R, Stoltz G, Szepesvári C (2011) X-armed bandits. J. Machine Learn. Res. 12(5):1655–1695.Google Scholar
Cao J, Gao R, Keyvanshokooh E (2025a) Hr-bandit: Human-ai collaborated linear recourse bandit. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York).Google Scholar
Cao J, Keyvanshokooh E, Liu T (2025b) Safe reinforcement learning with contextual information: Theory and applications. Preprint, submitted September 25, http://dx.doi.org/10.2139/ssrn.4583667.Google Scholar
Capan M, Khojandi A, Denton BT, Williams KD, Ayer T, Chhatwal J, Kurt M, et al. (2017) From data to improved decisions: Operations research in healthcare delivery. Med. Decis. Making 37(8):849–859.Crossref, Google Scholar
Catala-Lopez F, Saint-Gerons DM, Gonzalez-Bermejo D, Rosano GM, Davis BR, Ridao M, Zaragoza A, et al. (2016) Cardiovascular and renal outcomes of renin–angiotensin system blockade in adult patients with diabetes mellitus: A systematic review with network meta-analyses. PLoS Med. 13(3):1–30.Crossref, Google Scholar
Centers for Disease Control and Prevention (2020) Blood pressure medicines. Accessed June 6, 2022, https://www.cdc.gov/bloodpressure/medicines.htm.Google Scholar
Chan T, Narasimhan C, Xie Y (2013) Treatment effectiveness and side effects: A model of physician learning. Management Sci. 59(6):1309–1325.Link, Google Scholar
Chehrazi N, Cipriano LE, Enns EA (2019) Dynamics of drug resistance: Optimal control of an infectious disease. Oper. Res. 67(3):619–650.Link, Google Scholar
Chen W, Lu Y, Qiu L, Kumar S (2021) Designing personalized treatment plans for breast cancer. Inform. Systems Res. 32(3):932–949.Link, Google Scholar
Cheung WC, Simchi-Levi D, Zhu R (2021) Hedging the drift: Learning to optimize under nonstationarity. Management Sci. 68(3):1696–1713.Google Scholar
Chick SE, Gans N, Yapar Ö (2021) Bayesian sequential learning for clinical trials of multiple correlated medical interventions. Management Sci. 68(7):4919–4938.Google Scholar
Chow SC (2014) Adaptive clinical trial design. Annu. Rev. Med. 65:405–415.Crossref, Google Scholar
Chu W, Li L, Reyzin L, Schapire R (2011) Contextual bandits with linear payoff functions. Proc. Fourteenth Internat. Conf. Artificial Intelligence Statist. AISTATS’11 (JLMR, Ft. Lauderdale, FL), 208–214.Google Scholar
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Working paper, University of Chicago, Chicago, IL.Google Scholar
Denton BT (2018) Optimization of sequential decision making for chronic diseases: From data to decisions. Recent Adv. Optim. Modeling Contemporary Problems (INFORMS, Cantonsville, MD), 316–348.Link, Google Scholar
Denton BT, Alagoz O, Holder A, Lee EK (2011) Medical decision making: Open research challenges. IIE Trans. Healthcare Systems Engrg. 1(3):161–167.Crossref, Google Scholar
Epstein CCL (2014) An analytics approach to hypertension treatment. PhD thesis, Massachusetts Institute of Technology, Cambridge.Google Scholar
Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using thompson sampling. Oper. Res. 66(6):1586–1602.Link, Google Scholar
Filippi S, Cappé O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems, vol. 1, NIPS’10 (Curran Associates Inc., Red Hook, NY), 586–594.Google Scholar
Gan K, Keyvanshokooh E, Liu X, Murphy S (2024) Contextual bandits with budgeted information reveal. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 3970–3978.Google Scholar
Goff DC, Lloyd-Jones DM, Bennett G, Coady S, D’agostino RB, Gibbons R, Greenland P, et al. (2014) 2013 ACC/AHA Guideline on the assessment of cardiovascular risk: A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J. Amer. College Cardiol. 63(25 Part B):2935–2959.Crossref, Google Scholar
Goldenshluger A, Zeevi A (2013) A linear response bandit problem. Stoch. Syst. 3(1):230–261.Link, Google Scholar
Group AS (2010) Effects of intensive blood-pressure control in type 2 diabetes mellitus. New Engl. J. Med. 362(17):1575–1585.Crossref, Google Scholar
Guan W, Liang W, Zhao Y, Liang H, Chen Z, Li Y, Liu X, et al. (2020) Comorbidity and its impact on 1590 patients with COVID-19 in China: A nationwide analysis. Eur. Respir. J. 55(5):2000547–2000577.Crossref, Google Scholar
Hamidi N, Bayati M (2020) A general framework to analyze stochastic linear bandit. Working paper, Stanford University, Stanford, CA.Google Scholar
Helm JE, Lavieri MS, Van Oyen MP, Stein JD, Musch DC (2015) Dynamic forecasting and control algorithms of glaucoma progression for clinician decision support. Oper. Res. 63(5):979–999.Link, Google Scholar
Hopp WJ, Li J, Wang G (2018) Big data and the precision medicine revolution. Production Oper. Management 27(9):1647–1664.Crossref, Google Scholar
Iasonos A, O’Quigley J (2014) Adaptive dose-finding studies: A review of model-guided phase I clinical trials. J. Clin. Oncol. 32(23):2505–2511.Crossref, Google Scholar
Ibrahim R, Kucukyazici B, Verter V, Gendreau M, Blostein M (2016) Designing personalized treatment: An application to anticoagulation therapy. Production Oper. Management 25(5):902–918.Crossref, Google Scholar
James PA, Oparil S, Carter BL, Cushman WC, Dennison-Himmelfarb C, Handler J, Lackland DT, et al. (2014) 2014 evidence-based guideline for the management of high blood pressure in adults: Report from the panel members appointed to the eighth joint national committee (jnc 8). JAMA 311(5):507–520.Crossref, Google Scholar
Keyvanshokooh E (2021) Personalized data-driven learning and optimization: theory and applications to healthcare. PhD thesis, University of Michigan at Ann Arbor, Ann Arbor.Google Scholar
Kjeldsen SE (2018) Hypertension and cardiovascular risk: General aspects. Pharmacol. Res. 129:95–99.Crossref, Google Scholar
Kleinberg R, Slivkins A, Upfal E (2008) Multi-armed bandits in metric spaces. Proc. Fortieth Annual ACM Sympos. Theory Comput., STOC’08 (Association for Computing Machinery, New York), 681–690.Google Scholar
Lattimore T, Szepesvári C (2020) Bandit Algorithms (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Law M, Morris J, Wald N (2009) Use of blood pressure lowering drugs in the prevention of cardiovascular disease: Meta-analysis of 147 randomised trials in the context of expectations from prospective epidemiological studies. BMJ 338:1–19.Google Scholar
Lee CP, Chertow GM, Zenios SA (2008) Optimal initiation and management of dialysis therapy. Oper. Res. 56(6):1428–1449.Link, Google Scholar
Lee EK, Wei X, Baker-Witt F, Wright MD, Quarshie A (2018) Outcome-driven personalized treatment design for managing diabetes. Interfaces 48(5):422–435.Link, Google Scholar
Li L, Lu Y, Zhou D (2017) Provably optimal algorithms for generalized linear contextual bandits. Proc. 34th Internat. Conf. Machine Learn., vol. 70, ICML’17 (JMLR.org, Sydney, Australia), 2071–2080.Google Scholar
Liu X, Gan K, Keyvanshokooh E, Murphy S (2025) Online uniform sampling: Randomized learning-augmented approximation algorithms with application to digital health. Preprint, submitted February 3, https://arxiv.org/abs/2402.01995.Google Scholar
Mason JE, Denton BT, Shah ND, Smith SA (2014) Optimizing the simultaneous management of blood pressure and cholesterol for type 2 diabetes patients. Eur. J. Oper. Res. 233(3):727–738.Crossref, Google Scholar
Mintz Y, Aswani A, Kaminsky P, Flowers E, Fukuoka Y (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res. 68(5):1493–1516.Link, Google Scholar
Moore BL, Pyeatt LD, Kulkarni V, Panousis P, Padrez K, Doufas AG (2014) Reinforcement learning for closed-loop propofol anesthesia: A study in human volunteers. J. Machine Learn. Res. 15(1):655–696.Google Scholar
Negoescu DM, Bimpikis K, Brandeau ML, Iancu DA (2018) Dynamic learning of patient response types: An application to treating chronic diseases. Management Sci. 64(8):3469–3488.Link, Google Scholar
Padmanabhan R, Meskin N, Haddad WM (2017) Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment. Math. Biosci. 293:11–20.Crossref, Google Scholar
Padmanabhan R, Meskin N, Haddad WM (2019) Optimal adaptive control of drug dosing using integral reinforcement learning. Math. Biosci. 309:131–142.Crossref, Google Scholar
Palmer SC, Mavridis D, Navarese E, Craig JC, Tonelli M, Salanti G, Wiebe N, Ruospo M, Wheeler DC, Strippoli GF (2015) Comparative efficacy and safety of blood pressure-lowering agents in adults with diabetes and kidney disease: A network meta-analysis. Lancet 385(9982):2047–2056.Crossref, Google Scholar
Qiao D, Yin M, Min M, Wang YX (2022) Sample-efficient reinforcement learning with loglog (t) switching cost. Internat. Conf. Machine Learn. (PMLR), 18031–18061.Google Scholar
Richman IB, Fairley M, Jørgensen ME, Schuler A, Owens DK, Goldhaber-Fiebert JD (2016) Cost-effectiveness of intensive blood pressure management. JAMA Cardiol. 1(8):872–879.Crossref, Google Scholar
Rubino D, Abrahamsson N, Davies M, Hesse D, Greenway FL, Jensen C, Lingvay I, et al. (2021) Effect of continued weekly subcutaneous semaglutide vs placebo on weight loss maintenance in adults with overweight or obesity: The step 4 randomized clinical trial. JAMA 325(14):1414–1425.Crossref, Google Scholar
Rusmevichientong P, Tsitsiklis JN (2010) Linearly parameterized bandits. Math. Oper. Res. 35(2):395–411.Link, Google Scholar
Russo D, Van Roy B (2014) Learning to optimize via posterior sampling. Math. Oper. Res. 39(4):1221–1243.Link, Google Scholar
Skandari MR, Shechter SM (2021) Patient-type bayes-adaptive treatment plans. Oper. Res. 69(2):574–598.Link, Google Scholar
Tunc S, Alagoz O, Burnside E (2014) Opportunities for operations research in medical decision making. IEEE Intell. Syst. 29(3):59–62.Google Scholar
Virani SS, Alonso A, Aparicio HJ, Benjamin EJ, Bittencourt MS, Callaway CW, Carson AP, et al. (2021) Heart disease and stroke statistics—2021 update: A report from the American Heart Association. Circulation 143(8):e254–e743.Crossref, Google Scholar
Whelton PK, Carey RM, Aronow WS, Casey DE, Collins KJ, Himmelfarb CD, DePalma SM, et al. (2018) 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APHA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: A report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. J. Amer. College Cardiol. 71(19):e127–e248.Crossref, Google Scholar
WHO (2020) Hearts: Technical package for cardiovascular disease management in primary health care.Google Scholar
Yang Y, Goldhaber-Fiebert JD, Wein LM (2013) Analyzing screening policies for childhood obesity. Management Sci. 59(4):782–795.Link, Google Scholar
Yin G (2012) Clinical Trial Design: Bayesian and Frequentist Adaptive Methods, vol. 876 (John Wiley & Sons, Hoboken, NJ).Google Scholar
Yuan H, Luo Q, Shi C (2021) Marrying stochastic gradient descent with bandits: Learning algorithms for inventory systems with fixed costs. Management Sci. 67(10):6089–6115.Link, Google Scholar
Zhou J, Liu J, Narayan VA, Ye J (2012) Modeling disease progression via fused sparse group lasso. Proc. 18th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining, KDD ‘12 (Association for Computing Machinery, New York), 1095–1103.Google Scholar

Volume 71, Issue 12

December 2025

Pages vii-x, 9869-10753, iv-vi

Article Information

Supplemental Material

Metrics

Information

Received:December 03, 2019
Accepted:October 22, 2023
Published Online:May 02, 2025

Cite as

Esmaeil Keyvanshokooh, Mohammad Zhalechian, Cong Shi, Mark P. Van Oyen, Pooyan Kazemian (2025) Contextual Learning with Online Convex Optimization: Theory and Application to Medical Decision-Making. Management Science 71(12):10442-10464.

https://doi.org/10.1287/mnsc.2019.03211

Keywords

Acknowledgments

The authors thank the department editor Professor George Shanthikumar, the anonymous associate editor, and the anonymous referees for their constructive and detailed comments, which helped significantly improve both the content and the exposition of this paper. This paper was prepared using ACCORD Research Materials obtained from the NHLBI Biologic Specimen and Data Repository Information Coordinating Center and does not necessarily reflect the opinions or views of the ACCORD or the NHLBI.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Contextual Learning with Online Convex Optimization: Theory and Application to Medical Decision-Making

References

Volume 71, Issue 12

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News