Data-Pooling Reinforcement Learning for Preventative Healthcare Intervention

Xinyun Chen
Xinyun Chen
[email protected]
https://orcid.org/0000-0003-1727-0923
School of Data Science, School of Management and Economics, The Chinese University of Hong Kong, Shenzhen 518172, China
Search for more papers by this author
,
Pengyi Shi
Corresponding Author
Pengyi Shi
[email protected]
https://orcid.org/0000-0003-0905-7858
Mitch Daniels School of Business, Purdue University, West Lafayette, Indiana 47907
Search for more papers by this author
,
Shanwen Pu
Shanwen Pu
[email protected]
Shanghai University of Finance and Economics, Shanghai 200437, China
Search for more papers by this author

School of Data Science, School of Management and Economics, The Chinese University of Hong Kong, Shenzhen 518172, China

Search for more papers by this author

Pengyi Shi

Corresponding Author

Pengyi Shi

[email protected]

https://orcid.org/0000-0003-0905-7858

Mitch Daniels School of Business, Purdue University, West Lafayette, Indiana 47907

Search for more papers by this author

Shanwen Pu

[email protected]

Shanghai University of Finance and Economics, Shanghai 200437, China

Search for more papers by this author

Published Online:12 Dec 2025https://doi.org/10.1287/mnsc.2023.03880

References

Anderson E, Durstine JL (2019) Physical activity, exercise, and chronic diseases: A brief review. Sports Med. Health Sci. 1(1):3–10.Crossref, Google Scholar
Anonymous (2025) Technical companion for “data-pooling reinforcement learning for personalized healthcare intervention”. Accessed February 6, 2025, https://anonymous.4open.science/r/Data-pooling-Reinforcement-Learning-for-Personalized-Healthcare-Intervention-8143/tech_comp.pdf.Google Scholar
Arpita B, Gaurav A, Pradeep V, Milind T (2021) Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare. Proc. 13th Internat. Joint Conf. Artificial Intelligence, IJCAI 2021, Virtual Event, August 2021, 4039–4046.Google Scholar
Auer P, Ortner R (2006) Logarithmic online regret bounds for undiscounted reinforcement learning. Schölkopf B, Platt J, Hoffman T, eds. Advances in Neural Information Processing Systems, vol. 19. (MIT Press, Cambridge, MA).Google Scholar
Ayer T, Chen Q (2018) Covid-19: Novel coronavirus outbreak. Dai T, ed. Personalized Medicine, chapter 6 (John Wiley, Hoboken, NJ), 109–135.Google Scholar
Ayer T, Alagoz O, Stout NK, Burnside ES (2016) Heterogeneity in women’s adherence and its role in optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.Link, Google Scholar
Azar MG, Osband I, Munos R (2017) Minimax regret bounds for reinforcement learning. Internat. Conf. Machine Learn. (PMLR, New York), 263–272.Google Scholar
Barreto A, Dabney W, Munos R, Hunt JJ, Schaul T, van Hasselt HP, Silver D (2016) Successor features for transfer in reinforcement learning. Preprint, submitted June 16, https://doi.org/10.48550/arXiv.1606.05312.Google Scholar
Bastani H, Simchi-Levi D, Zhu R (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.Link, Google Scholar
Bertsimas D, Silberholz J, Trikalinos T (2018) Optimal healthcare decision making under multiple mathematical models: Application in prostate cancer screening. Health Care Management Sci. 21(1):105–118.Crossref, Google Scholar
Bonifonte A, Ayer T, Haaland B (2022) An analytics approach to guide randomized controlled trials in hypertension management. Management Sci. 68(9):6634–6647.Link, Google Scholar
Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Machine Learn. 22(1):33–57.Crossref, Google Scholar
Bresnick J (2016) Medication non-adherence brings millions in avoidable costs. HealthIT Analytics.Google Scholar
Bu J, Simchi-Levi D, Xu Y (2020) Online pricing with offline data: Phase transition and inverse square law. Internat. Conf. Machine Learn. (PMLR, New York), 1202–1210.Google Scholar
Chan TCY, Huang SY, Sarhangian V (2024) Dynamic control of service systems with returns: Application to design of postdischarge hospital readmission prevention programs. Oper. Res. 73(4):2242–2263.Link, Google Scholar
Chen X, Wang L, Hang Y, Ge H, Zha H (2020) Infinite-horizon off-policy policy evaluation with multiple behavior policies. 8th Internat. Conf. Learn. Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020 (OpenReview.net).Google Scholar
Chick SE, Gans N, Yapar Ö (2022) Bayesian sequential learning for clinical trials of multiple correlated medical interventions. Management Sci. 68(7):4919–4938.Link, Google Scholar
Efron B, Hastie T (2016) Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Fernández F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent. Proc. Fifth Internat. Joint Conf. Autonomous Agents Multiagent Systems, 720–727.Google Scholar
Ferns N, Panangaden P, Precup D (2004) Metrics for finite Markov decision processes. UAI, vol. 4. 162–169.Google Scholar
Gautier J-F, Boitard C, Michiels Y, Raymond G, Vergez G, Guedon G (2021) Impact of personalized text messages from pharmacists on medication adherence in type 2 diabetes in France: A real-world, randomized, comparative study. Patient Education Counseling 104(9):2250–2258.Crossref, Google Scholar
Gupta V (2022) Optimization in the small-data, large-scale regime. Chen X, Jasin S, Shi C, eds. The Elements of Joint Learning and Optimization in Operations Management (Springer, Berlin), 337–361.Crossref, Google Scholar
Gupta V, Kallus N (2022) Data pooling in stochastic optimization. Management Sci. 68(3):1595–1615.Link, Google Scholar
Gupta A, Mendonca R, Liu Y, Abbeel P, Levine S (2018) Meta-reinforcement learning of structured exploration strategies. Preprint, submitted February 20, https://doi.org/10.48550/arXiv.1802.07245.Google Scholar
Gur Y, Momeni A (2022) Adaptive sequential experiments with unknown information arrival processes. Manufacturing Service Oper. Management 24(5):2666–2684.Link, Google Scholar
Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann. Econom. Soc. Measurement 5(4):475–492.Google Scholar
Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600.Google Scholar
Kalvit A, Zeevi A (2021) A closer look at the worst-case behavior of multi-armed bandit algorithms. Adv. Neural Inform. Processing Systems 34:8807–8819.Google Scholar
Keyvanshokooh E, Zhalechian M, Shi C, Van Oyen MP, Kazemian P (2019) Contextual learning with online convex optimization: Theory and application to chronic diseases. Preprint, submitted December 10, https://ssrn.com/abstract=3501316.Google Scholar
Kini V, Ho PM (2018) Interventions to improve medication adherence: A review. JAMA 320(23):2461–2473.Crossref, Google Scholar
Kuang X, Wager S (2024) Weak signal asymptotics for sequentially randomized experiments. Management Sci. 70(10):7024–7041.Link, Google Scholar
Lazaric A (2012) Transfer in reinforcement learning: A framework and a survey. Wiering M, Otterlo M, eds. Reinforcement Learning: State-of-the-Art (Springer, Berlin), 143–173.Crossref, Google Scholar
Liao P, Greenewald K, Klasnja P, Murphy S (2020) Personalized HeartSteps: A reinforcement learning algorithm for optimizing physical activity. Proc. ACM Interactive Mobile Wearable Ubiquitous Technol. 4(1):1–22.Crossref, Google Scholar
Liu X, Hu M, Helm JE, Lavieri MS, Skolarus TA (2018) Missed opportunities in preventing hospital readmissions: Redesigning post-discharge checkup policies. Production Oper. Management 27(12):2226–2250.Crossref, Google Scholar
Löwe B, Unützer J, Callahan CM, Perkins AJ, Kroenke K (2004) Monitoring depression treatment outcomes with the patient health questionnaire-9. Medical Care 42(12):1194–1201.Crossref, Google Scholar
Miao S, Chen X, Chao X, Liu J, Zhang Y (2022) Context-based dynamic pricing with online clustering. Production Oper. Management 31(9):3559–3575.Crossref, Google Scholar
Min X, Yu B, Wang F (2019) Predictive modeling of the hospital readmission risk from patients’ claims data using machine learning: A case study on COPD. Sci. Rep. 9(1):2362.Crossref, Google Scholar
Newman PM, Franke MF, Arrieta J, Carrasco H, Elliott P, Flores H, Friedman A, et al. (2018) Community health workers improve disease control and medication adherence among patients with diabetes and/or hypertension in Chiapas, Mexico: An observational stepped-wedge study. BMJ Global Health 3(1):e000566.Crossref, Google Scholar
Osband I, Van Roy B, Wen Z (2016) Generalization and exploration via randomized value functions. Internat. Conf. Machine Learn. (PMLR, New York), 2377–2386.Google Scholar
Parisotto E, Ba LJ, Salakhutdinov R (2016) Actor-mimic: Deep multitask and transfer reinforcement learning. Bengio Y, LeCun Y, eds. 4th Internat. Conf. Learn. Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conf. Track Proc. (ICLR, Appleton, WI).Google Scholar
Piri H, Huh WT, Shechter SM, Hudson D (2022) Individualized dynamic patient monitoring under alarm fatigue. Oper. Res. 70(5):2749–2766.Link, Google Scholar
Robinson R, Hudali T (2017) The hospital score and lace index as predictors of 30 day readmission in a retrospective study at a university-affiliated community hospital. PeerJ 5:e3137.Crossref, Google Scholar
Russo D (2019) Worst-case regret bounds for exploration via randomized value functions. Preprint, submitted June 7, https://doi.org/10.48550/arXiv.1906.02870.Google Scholar
Rusu AA, Colmenarejo SG, Gülçehre Ç, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2016) Policy distillation. Bengio Y, LeCun Y, eds. 4th Internat. Conf. Learn. Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conf. Track Proc. (ICLR, Appleton, WI).Google Scholar
Santos L (2022) The impact of nutrition and lifestyle modification on health. Eur. J. Internal Med. 97:18–25.Crossref, Google Scholar
Shi P, Helm JE, Deglise-Hawkinson J, Pan J (2021) Timing it right: Balancing inpatient congestion vs. readmission risk at discharge. Oper. Res. 69(6):1842–1865.Link, Google Scholar
Skandari MR, Shechter SM (2021) Patient-type Bayes-adaptive treatment plans. Oper. Res. 69(2):574–598.Link, Google Scholar
Takchi R, Williams GA, Brauer D, Stoentcheva T, Wolf C, Van Anne B, Woolsey C, Hawkins WG (2020) Extending enhanced recovery after surgery protocols to the post-discharge setting: A phone call intervention to support patients after expedited discharge after pancreaticoduodenectomy. Amer. Surgeon 86(1):42–48.Crossref, Google Scholar
Tomkins S, Liao P, Klasnja P, Murphy S (2021) IntelligentPooling: Practical Thompson sampling for mHealth. Machine Learn. 110(9):2685–2727.Crossref, Google Scholar
Tunc S, Alagoz O, Burnside E (2014) Opportunities for operations research in medical decision making. IEEE Intelligent Systems 29(3):59–63.Google Scholar
Utomo CP, Kurniawati H, Li X, Pokharel S (2019) Personalised medicine in critical care using Bayesian reinforcement learning. Adv. Data Mining Applications 15th Internat. Conf., ADMA 2019, Dalian, China, November 21–23, 2019, Proc. 15 (Springer, Berlin), 648–657.Google Scholar
Vernon D, Brown JE, Griffiths E, Nevill AM, Pinkney M (2019) Reducing readmission rates through a discharge follow-up service. Future Healthcare J. 6(2):114–117.Crossref, Google Scholar
Wen Z (2014) Efficient Reinforcement Learning with Value Function Generalization (Stanford University, Stanford, CA).Google Scholar
Wilder B, Suen S-C, Tambe M (2020) Allocating outreach resources for disease control in a dynamic population with information spread. IISE Trans. 53(6):629–642.Crossref, Google Scholar
Xu Z, van Hasselt HP, Silver D (2018) Meta-gradient reinforcement learning. Preprint, submitted May 24, https://doi.org/10.48550/arXiv.1805.09801.Google Scholar
Yiadom MYAB, Domenico HJ, Byrne DW, Hasselblad M, Kripalani S, Choma N, Tucker-Marlow S, et al. (2020) Impact of a follow-up telephone call program on 30-day readmissions (futr-30): A pragmatic randomized controlled real-world effectiveness trial. Medical Care 58(9):785–792.Crossref, Google Scholar
Yin H, Pan S (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. Proc. AAAI Conf. Artificial Intelligence, vol. 31 (AAAI Press, Washington, DC).Google Scholar
Zhang Z, Shi P, Ward AR (2022) Routing for fairness and efficiency in a queueing model with reentry and continuous customer classes. 2022 Amer. Control Conf. (ACC) (IEEE, Piscataway, NJ), 4882–4887.Google Scholar
Zhou M, Mintz Y, Fukuoka Y, Goldberg K, Flowers E, Kaminsky P, Castillejo A, Aswani A (2018) Personalizing mobile fitness apps using reinforcement learning. CEUR Workshop Proc., vol. 2068 (NIH Public Access).Google Scholar
Zhu F, Guo J, Li R, Huang J (2018) Robust actor-critic contextual bandit for mobile health (mHealth) interventions. Proc. 2018 ACM Internat. Conf. Bioinformatics Comput. Biol. Health Informatics (ACM, New York), 492–501.Google Scholar
Zhu Z, Lin K, Jain AK, Zhou J (2023) Transfer learning in deep reinforcement learning: A survey. IEEE Trans. Pattern Analysis Machine Intelligence 45(11):13344–13362.Crossref, Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:November 27, 2023
Accepted:March 04, 2025
Published Online:December 12, 2025

Cite as

Xinyun Chen, Pengyi Shi, Shanwen Pu (2025) Data-Pooling Reinforcement Learning for Preventative Healthcare Intervention. Management Science 0(0).

https://doi.org/10.1287/mnsc.2023.03880

Keywords

Acknowledgments

Xinyun Chen and Pengyi Shi contributed equally.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Data-Pooling Reinforcement Learning for Preventative Healthcare Intervention

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News