Data-Pooling Reinforcement Learning for Preventative Healthcare Intervention

Published Online:https://doi.org/10.1287/mnsc.2023.03880

References

  • Anderson E, Durstine JL (2019) Physical activity, exercise, and chronic diseases: A brief review. Sports Med. Health Sci. 1(1):3–10.CrossrefGoogle Scholar
  • Anonymous (2025) Technical companion for “data-pooling reinforcement learning for personalized healthcare intervention”. Accessed February 6, 2025, https://anonymous.4open.science/r/Data-pooling-Reinforcement-Learning-for-Personalized-Healthcare-Intervention-8143/tech_comp.pdf.Google Scholar
  • Arpita B, Gaurav A, Pradeep V, Milind T (2021) Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare. Proc. 13th Internat. Joint Conf. Artificial Intelligence, IJCAI 2021, Virtual Event, August 2021, 4039–4046.Google Scholar
  • Auer P, Ortner R (2006) Logarithmic online regret bounds for undiscounted reinforcement learning. Schölkopf B, Platt J, Hoffman T, eds. Advances in Neural Information Processing Systems, vol. 19. (MIT Press, Cambridge, MA).Google Scholar
  • Ayer T, Chen Q (2018) Covid-19: Novel coronavirus outbreak. Dai T, ed. Personalized Medicine, chapter 6 (John Wiley, Hoboken, NJ), 109–135.Google Scholar
  • Ayer T, Alagoz O, Stout NK, Burnside ES (2016) Heterogeneity in women’s adherence and its role in optimal breast cancer screening policies. Management Sci. 62(5):1339–1362.LinkGoogle Scholar
  • Azar MG, Osband I, Munos R (2017) Minimax regret bounds for reinforcement learning. Internat. Conf. Machine Learn. (PMLR, New York), 263–272.Google Scholar
  • Barreto A, Dabney W, Munos R, Hunt JJ, Schaul T, van Hasselt HP, Silver D (2016) Successor features for transfer in reinforcement learning. Preprint, submitted June 16, https://doi.org/10.48550/arXiv.1606.05312.Google Scholar
  • Bastani H, Simchi-Levi D, Zhu R (2022) Meta dynamic pricing: Transfer learning across experiments. Management Sci. 68(3):1865–1881.LinkGoogle Scholar
  • Bertsimas D, Silberholz J, Trikalinos T (2018) Optimal healthcare decision making under multiple mathematical models: Application in prostate cancer screening. Health Care Management Sci. 21(1):105–118.CrossrefGoogle Scholar
  • Bonifonte A, Ayer T, Haaland B (2022) An analytics approach to guide randomized controlled trials in hypertension management. Management Sci. 68(9):6634–6647.LinkGoogle Scholar
  • Bradtke SJ, Barto AG (1996) Linear least-squares algorithms for temporal difference learning. Machine Learn. 22(1):33–57.CrossrefGoogle Scholar
  • Bresnick J (2016) Medication non-adherence brings millions in avoidable costs. HealthIT Analytics.Google Scholar
  • Bu J, Simchi-Levi D, Xu Y (2020) Online pricing with offline data: Phase transition and inverse square law. Internat. Conf. Machine Learn. (PMLR, New York), 1202–1210.Google Scholar
  • Chan TCY, Huang SY, Sarhangian V (2024) Dynamic control of service systems with returns: Application to design of postdischarge hospital readmission prevention programs. Oper. Res. 73(4):2242–2263.LinkGoogle Scholar
  • Chen X, Wang L, Hang Y, Ge H, Zha H (2020) Infinite-horizon off-policy policy evaluation with multiple behavior policies. 8th Internat. Conf. Learn. Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020 (OpenReview.net).Google Scholar
  • Chick SE, Gans N, Yapar Ö (2022) Bayesian sequential learning for clinical trials of multiple correlated medical interventions. Management Sci. 68(7):4919–4938.LinkGoogle Scholar
  • Efron B, Hastie T (2016) Computer Age Statistical Inference: Algorithms, Evidence, and Data Science (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Fernández F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent. Proc. Fifth Internat. Joint Conf. Autonomous Agents Multiagent Systems, 720–727.Google Scholar
  • Ferns N, Panangaden P, Precup D (2004) Metrics for finite Markov decision processes. UAI, vol. 4. 162–169.Google Scholar
  • Gautier J-F, Boitard C, Michiels Y, Raymond G, Vergez G, Guedon G (2021) Impact of personalized text messages from pharmacists on medication adherence in type 2 diabetes in France: A real-world, randomized, comparative study. Patient Education Counseling 104(9):2250–2258.CrossrefGoogle Scholar
  • Gupta V (2022) Optimization in the small-data, large-scale regime. Chen X, Jasin S, Shi C, eds. The Elements of Joint Learning and Optimization in Operations Management (Springer, Berlin), 337–361.CrossrefGoogle Scholar
  • Gupta V, Kallus N (2022) Data pooling in stochastic optimization. Management Sci. 68(3):1595–1615.LinkGoogle Scholar
  • Gupta A, Mendonca R, Liu Y, Abbeel P, Levine S (2018) Meta-reinforcement learning of structured exploration strategies. Preprint, submitted February 20, https://doi.org/10.48550/arXiv.1802.07245.Google Scholar
  • Gur Y, Momeni A (2022) Adaptive sequential experiments with unknown information arrival processes. Manufacturing Service Oper. Management 24(5):2666–2684.LinkGoogle Scholar
  • Heckman JJ (1976) The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Ann. Econom. Soc. Measurement 5(4):475–492.Google Scholar
  • Jaksch T, Ortner R, Auer P (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11:1563–1600.Google Scholar
  • Kalvit A, Zeevi A (2021) A closer look at the worst-case behavior of multi-armed bandit algorithms. Adv. Neural Inform. Processing Systems 34:8807–8819.Google Scholar
  • Keyvanshokooh E, Zhalechian M, Shi C, Van Oyen MP, Kazemian P (2019) Contextual learning with online convex optimization: Theory and application to chronic diseases. Preprint, submitted December 10, https://ssrn.com/abstract=3501316.Google Scholar
  • Kini V, Ho PM (2018) Interventions to improve medication adherence: A review. JAMA 320(23):2461–2473.CrossrefGoogle Scholar
  • Kuang X, Wager S (2024) Weak signal asymptotics for sequentially randomized experiments. Management Sci. 70(10):7024–7041.LinkGoogle Scholar
  • Lazaric A (2012) Transfer in reinforcement learning: A framework and a survey. Wiering M, Otterlo M, eds. Reinforcement Learning: State-of-the-Art (Springer, Berlin), 143–173.CrossrefGoogle Scholar
  • Liao P, Greenewald K, Klasnja P, Murphy S (2020) Personalized HeartSteps: A reinforcement learning algorithm for optimizing physical activity. Proc. ACM Interactive Mobile Wearable Ubiquitous Technol. 4(1):1–22.CrossrefGoogle Scholar
  • Liu X, Hu M, Helm JE, Lavieri MS, Skolarus TA (2018) Missed opportunities in preventing hospital readmissions: Redesigning post-discharge checkup policies. Production Oper. Management 27(12):2226–2250.CrossrefGoogle Scholar
  • Löwe B, Unützer J, Callahan CM, Perkins AJ, Kroenke K (2004) Monitoring depression treatment outcomes with the patient health questionnaire-9. Medical Care 42(12):1194–1201.CrossrefGoogle Scholar
  • Miao S, Chen X, Chao X, Liu J, Zhang Y (2022) Context-based dynamic pricing with online clustering. Production Oper. Management 31(9):3559–3575.CrossrefGoogle Scholar
  • Min X, Yu B, Wang F (2019) Predictive modeling of the hospital readmission risk from patients’ claims data using machine learning: A case study on COPD. Sci. Rep. 9(1):2362.CrossrefGoogle Scholar
  • Newman PM, Franke MF, Arrieta J, Carrasco H, Elliott P, Flores H, Friedman A, et al. (2018) Community health workers improve disease control and medication adherence among patients with diabetes and/or hypertension in Chiapas, Mexico: An observational stepped-wedge study. BMJ Global Health 3(1):e000566.CrossrefGoogle Scholar
  • Osband I, Van Roy B, Wen Z (2016) Generalization and exploration via randomized value functions. Internat. Conf. Machine Learn. (PMLR, New York), 2377–2386.Google Scholar
  • Parisotto E, Ba LJ, Salakhutdinov R (2016) Actor-mimic: Deep multitask and transfer reinforcement learning. Bengio Y, LeCun Y, eds. 4th Internat. Conf. Learn. Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conf. Track Proc. (ICLR, Appleton, WI).Google Scholar
  • Piri H, Huh WT, Shechter SM, Hudson D (2022) Individualized dynamic patient monitoring under alarm fatigue. Oper. Res. 70(5):2749–2766.LinkGoogle Scholar
  • Robinson R, Hudali T (2017) The hospital score and lace index as predictors of 30 day readmission in a retrospective study at a university-affiliated community hospital. PeerJ 5:e3137.CrossrefGoogle Scholar
  • Russo D (2019) Worst-case regret bounds for exploration via randomized value functions. Preprint, submitted June 7, https://doi.org/10.48550/arXiv.1906.02870.Google Scholar
  • Rusu AA, Colmenarejo SG, Gülçehre Ç, Desjardins G, Kirkpatrick J, Pascanu R, Mnih V, Kavukcuoglu K, Hadsell R (2016) Policy distillation. Bengio Y, LeCun Y, eds. 4th Internat. Conf. Learn. Representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conf. Track Proc. (ICLR, Appleton, WI).Google Scholar
  • Santos L (2022) The impact of nutrition and lifestyle modification on health. Eur. J. Internal Med. 97:18–25.CrossrefGoogle Scholar
  • Shi P, Helm JE, Deglise-Hawkinson J, Pan J (2021) Timing it right: Balancing inpatient congestion vs. readmission risk at discharge. Oper. Res. 69(6):1842–1865.LinkGoogle Scholar
  • Skandari MR, Shechter SM (2021) Patient-type Bayes-adaptive treatment plans. Oper. Res. 69(2):574–598.LinkGoogle Scholar
  • Takchi R, Williams GA, Brauer D, Stoentcheva T, Wolf C, Van Anne B, Woolsey C, Hawkins WG (2020) Extending enhanced recovery after surgery protocols to the post-discharge setting: A phone call intervention to support patients after expedited discharge after pancreaticoduodenectomy. Amer. Surgeon 86(1):42–48.CrossrefGoogle Scholar
  • Tomkins S, Liao P, Klasnja P, Murphy S (2021) IntelligentPooling: Practical Thompson sampling for mHealth. Machine Learn. 110(9):2685–2727.CrossrefGoogle Scholar
  • Tunc S, Alagoz O, Burnside E (2014) Opportunities for operations research in medical decision making. IEEE Intelligent Systems 29(3):59–63.Google Scholar
  • Utomo CP, Kurniawati H, Li X, Pokharel S (2019) Personalised medicine in critical care using Bayesian reinforcement learning. Adv. Data Mining Applications 15th Internat. Conf., ADMA 2019, Dalian, China, November 21–23, 2019, Proc. 15 (Springer, Berlin), 648–657.Google Scholar
  • Vernon D, Brown JE, Griffiths E, Nevill AM, Pinkney M (2019) Reducing readmission rates through a discharge follow-up service. Future Healthcare J. 6(2):114–117.CrossrefGoogle Scholar
  • Wen Z (2014) Efficient Reinforcement Learning with Value Function Generalization (Stanford University, Stanford, CA).Google Scholar
  • Wilder B, Suen S-C, Tambe M (2020) Allocating outreach resources for disease control in a dynamic population with information spread. IISE Trans. 53(6):629–642.CrossrefGoogle Scholar
  • Xu Z, van Hasselt HP, Silver D (2018) Meta-gradient reinforcement learning. Preprint, submitted May 24, https://doi.org/10.48550/arXiv.1805.09801.Google Scholar
  • Yiadom MYAB, Domenico HJ, Byrne DW, Hasselblad M, Kripalani S, Choma N, Tucker-Marlow S, et al. (2020) Impact of a follow-up telephone call program on 30-day readmissions (futr-30): A pragmatic randomized controlled real-world effectiveness trial. Medical Care 58(9):785–792.CrossrefGoogle Scholar
  • Yin H, Pan S (2017) Knowledge transfer for deep reinforcement learning with hierarchical experience replay. Proc. AAAI Conf. Artificial Intelligence, vol. 31 (AAAI Press, Washington, DC).Google Scholar
  • Zhang Z, Shi P, Ward AR (2022) Routing for fairness and efficiency in a queueing model with reentry and continuous customer classes. 2022 Amer. Control Conf. (ACC) (IEEE, Piscataway, NJ), 4882–4887.Google Scholar
  • Zhou M, Mintz Y, Fukuoka Y, Goldberg K, Flowers E, Kaminsky P, Castillejo A, Aswani A (2018) Personalizing mobile fitness apps using reinforcement learning. CEUR Workshop Proc., vol. 2068 (NIH Public Access).Google Scholar
  • Zhu F, Guo J, Li R, Huang J (2018) Robust actor-critic contextual bandit for mobile health (mHealth) interventions. Proc. 2018 ACM Internat. Conf. Bioinformatics Comput. Biol. Health Informatics (ACM, New York), 492–501.Google Scholar
  • Zhu Z, Lin K, Jain AK, Zhou J (2023) Transfer learning in deep reinforcement learning: A survey. IEEE Trans. Pattern Analysis Machine Intelligence 45(11):13344–13362.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.