Policy Optimization for Personalized Interventions in Behavioral Health

Published Online:https://doi.org/10.1287/msom.2023.0548

References

  • Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Proc. 25th Internat. Conf. Neural Inform. Processing Systems (NIPS’11), vol. 25 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
  • Adelman D, Mersereau AJ (2008) Relaxations of weakly coupled stochastic dynamic programs. Oper. Res. 56(3):712–727.LinkGoogle Scholar
  • Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 28, no. 3 (PMLR, New York), 127–135.Google Scholar
  • Ansell PS, Glazebrook KD, Nino-Mora J, O’Keeffe M (2003) Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57:21–39.CrossrefGoogle Scholar
  • Aswani A, Kaminsky P, Mintz Y, Flowers E, Fukuoka Y (2019) Behavioral modeling in weight loss interventions. Eur. J. Oper. Res. 272(3):1058–1072.CrossrefGoogle Scholar
  • Avrachenkov KE, Borkar VS (2022) Whittle index based q-learning for restless bandits with average reward. Automatica 139:110186.CrossrefGoogle Scholar
  • Biswas A, Aggarwal G, Varakantham P, Tambe M (2021) Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare. Preprint, submitted May 17, https://arxiv.org/abs/2105.07965.Google Scholar
  • Bosworth HB, Granger BB, Mendys P, Brindis R, Burkholder R, Czajkowski SM, Daniel JG, et al. (2011) Medication adherence: A call for action. Amer. Heart J. 162(3):412–424.CrossrefGoogle Scholar
  • Boutilier JJ, Jónasson JO, Yoeli E (2022) Improving tuberculosis treatment adherence support: The case for targeted behavioral interventions. Manufacturing Service Oper. Management 24(6):2925–2943.LinkGoogle Scholar
  • Brandfonbrener D, Whitney WF, Ranganath R, Bruna J (2021) Offline RL without off-policy evaluation. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Proc. 35th Internat. Conf. Neural Inform. Processing Systems (NIPS’21), vol. 35 (Curran Associates Inc., Red Hook, NY), 1–14.Google Scholar
  • Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, Robins J (2018) Double/debiased machine learning for treatment and structural parameters. Econometrics J. 21(1):C1–C68.Google Scholar
  • D’Aeth JC, Ghosal S, Grimm F, Haw D, Koca E, Lau K, Liu H, et al. (2023) Optimal hospital care scheduling during the SARS-COV-2 pandemic. Management Sci. 69(10):5923–5947.LinkGoogle Scholar
  • Fu J, Nazarathy Y, Moka S, Taylor PG (2019) Towards Q-learning the Whittle index for restless bandits. 2019 Australian New Zealand Control Conf. (IEEE, Piscataway, NJ), 249–254.Google Scholar
  • Garfein RS, Doshi RP (2019) Synchronous and asynchronous video observed therapy (VOT) for tuberculosis treatment adherence monitoring and support. J. Clinical Tuberculosis Other Mycobacterial Diseases 17:100098.CrossrefGoogle Scholar
  • Gilbert EN (1960) Capacity of a burst-noise channel. Bell System Tech. J. 39(5):1253–1265.CrossrefGoogle Scholar
  • Glazebrook KD, Mitchell HM (2002) An index policy for a stochastic scheduling model with improving/deteriorating jobs. Naval Res. Logist. 49(7):706–721.CrossrefGoogle Scholar
  • Glazebrook KD, Ruiz-Hernandez D, Kirkbride C (2006) Some indexable families of restless bandit problems. Adv. Appl. Probab. 38(3):643–672.CrossrefGoogle Scholar
  • Gong X-Y, Goyal V, Iyengar GN, Simchi-Levi D, Udwani R, Wang S (2021) Online assortment optimization with reusable resources. Management Sci. 68(7):4772–4785.LinkGoogle Scholar
  • Greenewald K, Tewari A, Klasnja P, Murphy S (2017) Action centered contextual bandits. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proc. 30th Internat. Conf. Neural Inform. Processing Systems (NIPS’17) (Curran Associates Inc., Red Hook, NY), 5979–5987.Google Scholar
  • Guha S, Munagala K, Shi P (2010) Approximation algorithms for restless bandit problems. J. ACM 58(1):1–50.CrossrefGoogle Scholar
  • Howard RA (1960) Dynamic Programming and Markov Processes (MIT Press, Boston).Google Scholar
  • Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is Q-learning provably efficient? Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (NIPS’18) (Curran Associates Inc., Red Hook, NY), 4868–4878.Google Scholar
  • Jung YH, Tewari A (2019) Regret bounds for Thompson sampling in episodic restless bandit problems. Adv. Neural Inform. Processing Systems, vol. 32.Google Scholar
  • Lei H, Tewari A, Murphy SA (2017) An actor-critic contextual bandit algorithm for personalized mobile health interventions. Preprint, submitted June 28, https://arxiv.org/abs/1706.09090.Google Scholar
  • Levine S, Kumar A, Tucker G, Fu J (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint, submitted May 4, https://arxiv.org/abs/2005.01643.Google Scholar
  • Li S, Wang B, Zhang S, Chen W (2016) Contextual combinatorial cascading bandits. Proc. 33rd Internat. Conf. Machine Learn. (ICML’16), vol. 48 (JMLR.org, New York), 1245–1253.Google Scholar
  • Liao P, Greenewald K, Klasnja P, Murphy S (2020) Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. Proc. ACM Interactive Mobile Wearable Ubiquitous Tech. 4(1):1–22.CrossrefGoogle Scholar
  • Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.CrossrefGoogle Scholar
  • Mate A, Killian J, Xu H, Perrault A, Tambe M (2020) Collapsing bandits and their application to public health intervention. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. 34th Internat. Conf. Neural Inform. Processing Systems (NIPS’20) (Curran Associates Inc., Red Hook, NY).Google Scholar
  • Mate A, Madaan L, Taneja A, Madhiwalla N, Verma S, Singh G, Hegde A, Varakantham P, Tambe M (2022) Field study in deploying restless multi-armed bandits: Assisting non-profits in improving maternal and child health. Proc. AAAI Conf. Artificial Intelligence 36(11):12017–12025.Google Scholar
  • Meuleau N, Hauskrecht M, Kim K-E, Peshkin L, Kaelbling LP, Dean TL, Boutilier C (1998) Solving very large weakly coupled Markov decision processes. Proc. Fifteenth Natl./Tenth Conf. Artificial Intelligence/Innovative Appl. Artificial Intelligence (AAAI’98/IAAI’98) (American Association for Artificial Intelligence, Palo Alto, CA), 165–172.Google Scholar
  • Mills S (2022) Personalized nudging. Behav. Public Policy 6(1):150–159.CrossrefGoogle Scholar
  • Mintz Y, Aswani A, Kaminsky P, Flowers E, Fukuoka Y (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res. 68(5):1493–1516.LinkGoogle Scholar
  • Naeini MP, Cooper GF, Hauskrecht M (2015) Obtaining well-calibrated probabilities using Bayesian binning. Proc. Twenty-Ninth AAAI Conf. Artificial Intelligence (AAAI’15) (AAAI Press, Palo Alto, CA), 2901–2907.Google Scholar
  • Niño-Mora J (2020) A fast-pivoting algorithm for Whittle’s restless bandit index. Mathematics 8(12):2226.CrossrefGoogle Scholar
  • Papadimitriou CH, Tsitsiklis JN (1994) The complexity of optimal queueing network control. Proc. IEEE Ninth Annual Conf. Structure Complexity Theory (IEEE, Piscataway, NJ), 318–322.Google Scholar
  • Qin L, Chen S, Zhu X (2014) Contextual combinatorial bandit and its application on diversified online recommendation. Zaki M, Obradovic Z, Tan PN, Banerjee A, Kamath C, Parthasarathy S, eds. Proc. 2014 SIAM Internat. Conf. Data Mining (SDM) (SIAM, Philadelphia), 461–469.Google Scholar
  • Ruggeri K, Benzerga A, Verra S, Folke T (2023) A behavioral approach to personalizing public health. Behav. Public Policy 7(2):457–469.CrossrefGoogle Scholar
  • Schmittlein DC, Morrison DG, Colombo R (1987) Counting your customers: Who-are they and what will they do next? Management Sci. 33(1):1–24.LinkGoogle Scholar
  • Suen S-C, Bendavid E, Goldhaber-Fiebert JD (2014) Disease control implications of India’s changing multi-drug resistant tuberculosis epidemic. PLoS One 9(3):e89822.CrossrefGoogle Scholar
  • Suen S-C, Brandeau ML, Goldhaber-Fiebert JD (2018) Optimal timing of drug sensitivity testing for patients on first-line tuberculosis treatment. Health Care Management Sci. 21(4):632–646.CrossrefGoogle Scholar
  • Suen S-C, Negoescu D, Goh J (2022) Design of incentive programs for optimal medication adherence in the presence of observable consumption. Oper. Res. 70(3):1691–1716.LinkGoogle Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Szepesvári C (2022) Algorithms for Reinforcement Learning (Springer Nature, London).Google Scholar
  • Wang R, Foster DP, Kakade SM (2020a) What are the statistical limits of offline RL with linear function approximation? Preprint, submitted October 22, https://arxiv.org/abs/2010.11895.Google Scholar
  • Wang S, Huang L, Lui JCS (2020b) Restless-UCB, an efficient and low-complexity algorithm for online restless bandits. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. 34th Internat. Conf. Neural Inform. Processing Systems (NIPS’20) (Curran Associates Inc., Red Hook, NY), 11878–11889.Google Scholar
  • Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.CrossrefGoogle Scholar
  • Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.CrossrefGoogle Scholar
  • World Health Organization (2022) Global Tuberculosis Report 2022 (World Health Organization, Geneva), xiii, 51 p.Google Scholar
  • Yoeli E, Rathauser J, Bhanot SP, Kimenye MK, Mailu E, Masini E, Owiti P, Rand D (2019) Digital health support in treatment for tuberculosis. New England J. Medicine 381(10):986–987.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.