Data-Pooling Reinforcement Learning for Preventative Healthcare Intervention
Published Online:12 Dec 2025https://doi.org/10.1287/mnsc.2023.03880
References
- (2019) Physical activity, exercise, and chronic diseases: A brief review. Sports Med. Health Sci. 1(1):3–10.Crossref, Google Scholar
- Anonymous (2025) Technical companion for “data-pooling reinforcement learning for personalized healthcare intervention”. Accessed February 6, 2025, https://anonymous.4open.science/r/Data-pooling-Reinforcement-Learning-for-Personalized-Healthcare-Intervention-8143/tech_comp.pdf.Google Scholar
- (2021) Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare. Proc. 13th Internat. Joint Conf. Artificial Intelligence, IJCAI 2021, Virtual Event, August 2021, 4039–4046.Google Scholar
- (2006) Logarithmic online regret bounds for undiscounted reinforcement learning. Schölkopf B, Platt J, Hoffman T, eds. Advances in Neural Information Processing Systems, vol. 19. (MIT Press, Cambridge, MA).Google Scholar
- (2018) Covid-19: Novel coronavirus outbreak. Dai T, ed. Personalized Medicine, chapter 6 (John Wiley, Hoboken, NJ), 109–135.Google Scholar
- (2016) Heterogeneity in women’s adherence and its role in optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.Link, Google Scholar
- (2017) Minimax regret bounds for reinforcement learning. Internat. Conf. Machine Learn. (PMLR, New York), 263–272.Google Scholar
- (2016) Successor features for transfer in reinforcement learning. Preprint, submitted June 16, https://doi.org/10.48550/arXiv.1606.05312.Google Scholar
- (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.Link, Google Scholar
- (2018) Optimal healthcare decision making under multiple mathematical models: Application in prostate cancer screening. Health Care Management Sci. 21(1):105–118.Crossref, Google Scholar
- (2022) An analytics approach to guide randomized controlled trials in hypertension management. Management Sci. 68(9):6634–6647.Link, Google Scholar
- (1996) Linear least-squares algorithms for temporal difference learning. Machine Learn. 22(1):33–57.Crossref, Google Scholar
- (2016) Medication non-adherence brings millions in avoidable costs. HealthIT Analytics.Google Scholar
- (2020) Online pricing with offline data: Phase transition and inverse square law. Internat. Conf. Machine Learn. (PMLR, New York), 1202–1210.Google Scholar
- (2024) Dynamic control of service systems with returns: Application to design of postdischarge hospital readmission prevention programs. Oper. Res. 73(4):2242–2263.Link, Google Scholar
- (2020) Infinite-horizon off-policy policy evaluation with multiple behavior policies. 8th Internat. Conf. Learn. Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020 (OpenReview.net).Google Scholar
- (2022) Bayesian sequential learning for clinical trials of multiple correlated medical interventions. Management Sci. 68(7):4919–4938.Link, Google Scholar
- (2016) Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2006) Probabilistic policy reuse in a reinforcement learning agent. Proc. Fifth Internat. Joint Conf. Autonomous Agents Multiagent Systems, 720–727.Google Scholar
- (2004) Metrics for finite Markov decision processes. UAI, vol. 4. 162–169.Google Scholar
- (2021) Impact of personalized text messages from pharmacists on medication adherence in type 2 diabetes in France: A real-world, randomized, comparative study. Patient Education Counseling 104(9):2250–2258.Crossref, Google Scholar
- (2022) Optimization in the small-data, large-scale regime. Chen X, Jasin S, Shi C, eds. The Elements of Joint Learning and Optimization in Operations Management (Springer, Berlin), 337–361.Crossref, Google Scholar
- (2022) Data pooling in stochastic optimization. Management Sci. 68(3):1595–1615.Link, Google Scholar
- (2018) Meta-reinforcement learning of structured exploration strategies. Preprint, submitted February 20, https://doi.org/10.48550/arXiv.1802.07245.Google Scholar
- (2022) Adaptive sequential experiments with unknown information arrival processes. Manufacturing Service Oper. Management 24(5):2666–2684.Link, Google Scholar
- (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann. Econom. Soc. Measurement 5(4):475–492.Google Scholar
- (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600.Google Scholar
- (2021) A closer look at the worst-case behavior of multi-armed bandit algorithms. Adv. Neural Inform. Processing Systems 34:8807–8819.Google Scholar
- (2019) Contextual learning with online convex optimization: Theory and application to chronic diseases. Preprint, submitted December 10, https://ssrn.com/abstract=3501316.Google Scholar
- (2018) Interventions to improve medication adherence: A review. JAMA 320(23):2461–2473.Crossref, Google Scholar
- (2024) Weak signal asymptotics for sequentially randomized experiments. Management Sci. 70(10):7024–7041.Link, Google Scholar
- (2012) Transfer in reinforcement learning: A framework and a survey. Wiering M, Otterlo M, eds. Reinforcement Learning: State-of-the-Art (Springer, Berlin), 143–173.Crossref, Google Scholar
- (2020) Personalized HeartSteps: A reinforcement learning algorithm for optimizing physical activity. Proc. ACM Interactive Mobile Wearable Ubiquitous Technol. 4(1):1–22.Crossref, Google Scholar
- (2018) Missed opportunities in preventing hospital readmissions: Redesigning post-discharge checkup policies. Production Oper. Management 27(12):2226–2250.Crossref, Google Scholar
- (2004) Monitoring depression treatment outcomes with the patient health questionnaire-9. Medical Care 42(12):1194–1201.Crossref, Google Scholar
- (2022) Context-based dynamic pricing with online clustering. Production Oper. Management 31(9):3559–3575.Crossref, Google Scholar
- (2019) Predictive modeling of the hospital readmission risk from patients’ claims data using machine learning: A case study on COPD. Sci. Rep. 9(1):2362.Crossref, Google Scholar
- (2018) Community health workers improve disease control and medication adherence among patients with diabetes and/or hypertension in Chiapas, Mexico: An observational stepped-wedge study. BMJ Global Health 3(1):e000566.Crossref, Google Scholar
- (2016) Generalization and exploration via randomized value functions. Internat. Conf. Machine Learn. (PMLR, New York), 2377–2386.Google Scholar
- (2016) Actor-mimic: Deep multitask and transfer reinforcement learning. Bengio Y, LeCun Y, eds. 4th Internat. Conf. Learn. Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conf. Track Proc. (ICLR, Appleton, WI).Google Scholar
- (2022) Individualized dynamic patient monitoring under alarm fatigue. Oper. Res. 70(5):2749–2766.Link, Google Scholar
- (2017) The hospital score and lace index as predictors of 30 day readmission in a retrospective study at a university-affiliated community hospital. PeerJ 5:e3137.Crossref, Google Scholar
- (2019) Worst-case regret bounds for exploration via randomized value functions. Preprint, submitted June 7, https://doi.org/10.48550/arXiv.1906.02870.Google Scholar
- (2016) Policy distillation. Bengio Y, LeCun Y, eds. 4th Internat. Conf. Learn. Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conf. Track Proc. (ICLR, Appleton, WI).Google Scholar
- (2022) The impact of nutrition and lifestyle modification on health. Eur. J. Internal Med. 97:18–25.Crossref, Google Scholar
- (2021) Timing it right: Balancing inpatient congestion vs. readmission risk at discharge. Oper. Res. 69(6):1842–1865.Link, Google Scholar
- (2021) Patient-type Bayes-adaptive treatment plans. Oper. Res. 69(2):574–598.Link, Google Scholar
- (2020) Extending enhanced recovery after surgery protocols to the post-discharge setting: A phone call intervention to support patients after expedited discharge after pancreaticoduodenectomy. Amer. Surgeon 86(1):42–48.Crossref, Google Scholar
- (2021) IntelligentPooling: Practical Thompson sampling for mHealth. Machine Learn. 110(9):2685–2727.Crossref, Google Scholar
- (2014) Opportunities for operations research in medical decision making. IEEE Intelligent Systems 29(3):59–63.Google Scholar
- (2019) Personalised medicine in critical care using Bayesian reinforcement learning. Adv. Data Mining Applications 15th Internat. Conf., ADMA 2019, Dalian, China, November 21–23, 2019, Proc. 15 (Springer, Berlin), 648–657.Google Scholar
- (2019) Reducing readmission rates through a discharge follow-up service. Future Healthcare J. 6(2):114–117.Crossref, Google Scholar
- (2014) Efficient Reinforcement Learning with Value Function Generalization (Stanford University, Stanford, CA).Google Scholar
- (2020) Allocating outreach resources for disease control in a dynamic population with information spread. IISE Trans. 53(6):629–642.Crossref, Google Scholar
- (2018) Meta-gradient reinforcement learning. Preprint, submitted May 24, https://doi.org/10.48550/arXiv.1805.09801.Google Scholar
- (2020) Impact of a follow-up telephone call program on 30-day readmissions (futr-30): A pragmatic randomized controlled real-world effectiveness trial. Medical Care 58(9):785–792.Crossref, Google Scholar
- (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. Proc. AAAI Conf. Artificial Intelligence, vol. 31 (AAAI Press, Washington, DC).Google Scholar
- (2022) Routing for fairness and efficiency in a queueing model with reentry and continuous customer classes. 2022 Amer. Control Conf. (ACC) (IEEE, Piscataway, NJ), 4882–4887.Google Scholar
- (2018) Personalizing mobile fitness apps using reinforcement learning. CEUR Workshop Proc., vol. 2068 (NIH Public Access).Google Scholar
- (2018) Robust actor-critic contextual bandit for mobile health (mHealth) interventions. Proc. 2018 ACM Internat. Conf. Bioinformatics Comput. Biol. Health Informatics (ACM, New York), 492–501.Google Scholar
- (2023) Transfer learning in deep reinforcement learning: A survey. IEEE Trans. Pattern Analysis Machine Intelligence 45(11):13344–13362.Crossref, Google Scholar

