Policy Optimization for Personalized Interventions in Behavioral Health

Jackie Baek
Jackie Baek
[email protected]
https://orcid.org/0000-0001-5538-509X
Stern School of Business, New York University, New York, New York 10012
Search for more papers by this author
,
Justin J. Boutilier
Justin J. Boutilier
[email protected]
https://orcid.org/0000-0003-0904-4467
Telfer School of Management, University of Ottawa, Ottawa, Ontario K1N 9B9, Canada
Search for more papers by this author
,
Vivek F. Farias
Vivek F. Farias
[email protected]
https://orcid.org/0000-0002-5856-9246
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142
Search for more papers by this author
,
Jónas Oddur Jónasson
Corresponding Author
Jónas Oddur Jónasson
[email protected]
https://orcid.org/0000-0003-2316-684X
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142
Search for more papers by this author
,
Erez Yoeli
Erez Yoeli
[email protected]
https://orcid.org/0000-0002-8459-017X
Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142
Search for more papers by this author

Stern School of Business, New York University, New York, New York 10012

Search for more papers by this author

Justin J. Boutilier

[email protected]

https://orcid.org/0000-0003-0904-4467

Telfer School of Management, University of Ottawa, Ottawa, Ontario K1N 9B9, Canada

Search for more papers by this author

Vivek F. Farias

[email protected]

https://orcid.org/0000-0002-5856-9246

Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142

Search for more papers by this author

Jónas Oddur Jónasson

Corresponding Author

Jónas Oddur Jónasson

[email protected]

https://orcid.org/0000-0003-2316-684X

Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142

Search for more papers by this author

Erez Yoeli

[email protected]

https://orcid.org/0000-0002-8459-017X

Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142

Search for more papers by this author

Published Online:19 Mar 2025https://doi.org/10.1287/msom.2023.0548

References

Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Proc. 25th Internat. Conf. Neural Inform. Processing Systems (NIPS’11), vol. 25 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
Adelman D, Mersereau AJ (2008) Relaxations of weakly coupled stochastic dynamic programs. Oper. Res. 56(3):712–727.Link, Google Scholar
Agrawal S, Goyal N (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 28, no. 3 (PMLR, New York), 127–135.Google Scholar
Ansell PS, Glazebrook KD, Nino-Mora J, O’Keeffe M (2003) Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57:21–39.Crossref, Google Scholar
Aswani A, Kaminsky P, Mintz Y, Flowers E, Fukuoka Y (2019) Behavioral modeling in weight loss interventions. Eur. J. Oper. Res. 272(3):1058–1072.Crossref, Google Scholar
Avrachenkov KE, Borkar VS (2022) Whittle index based q-learning for restless bandits with average reward. Automatica 139:110186.Crossref, Google Scholar
Biswas A, Aggarwal G, Varakantham P, Tambe M (2021) Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare. Preprint, submitted May 17, https://arxiv.org/abs/2105.07965.Google Scholar
Bosworth HB, Granger BB, Mendys P, Brindis R, Burkholder R, Czajkowski SM, Daniel JG, et al. (2011) Medication adherence: A call for action. Amer. Heart J. 162(3):412–424.Crossref, Google Scholar
Boutilier JJ, Jónasson JO, Yoeli E (2022) Improving tuberculosis treatment adherence support: The case for targeted behavioral interventions. Manufacturing Service Oper. Management 24(6):2925–2943.Link, Google Scholar
Brandfonbrener D, Whitney WF, Ranganath R, Bruna J (2021) Offline RL without off-policy evaluation. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Proc. 35th Internat. Conf. Neural Inform. Processing Systems (NIPS’21), vol. 35 (Curran Associates Inc., Red Hook, NY), 1–14.Google Scholar
Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C, Newey W, Robins J (2018) Double/debiased machine learning for treatment and structural parameters. Econometrics J. 21(1):C1–C68.Google Scholar
D’Aeth JC, Ghosal S, Grimm F, Haw D, Koca E, Lau K, Liu H, et al. (2023) Optimal hospital care scheduling during the SARS-COV-2 pandemic. Management Sci. 69(10):5923–5947.Link, Google Scholar
Fu J, Nazarathy Y, Moka S, Taylor PG (2019) Towards Q-learning the Whittle index for restless bandits. 2019 Australian New Zealand Control Conf. (IEEE, Piscataway, NJ), 249–254.Google Scholar
Garfein RS, Doshi RP (2019) Synchronous and asynchronous video observed therapy (VOT) for tuberculosis treatment adherence monitoring and support. J. Clinical Tuberculosis Other Mycobacterial Diseases 17:100098.Crossref, Google Scholar
Gilbert EN (1960) Capacity of a burst-noise channel. Bell System Tech. J. 39(5):1253–1265.Crossref, Google Scholar
Glazebrook KD, Mitchell HM (2002) An index policy for a stochastic scheduling model with improving/deteriorating jobs. Naval Res. Logist. 49(7):706–721.Crossref, Google Scholar
Glazebrook KD, Ruiz-Hernandez D, Kirkbride C (2006) Some indexable families of restless bandit problems. Adv. Appl. Probab. 38(3):643–672.Crossref, Google Scholar
Gong X-Y, Goyal V, Iyengar GN, Simchi-Levi D, Udwani R, Wang S (2021) Online assortment optimization with reusable resources. Management Sci. 68(7):4772–4785.Link, Google Scholar
Greenewald K, Tewari A, Klasnja P, Murphy S (2017) Action centered contextual bandits. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proc. 30th Internat. Conf. Neural Inform. Processing Systems (NIPS’17) (Curran Associates Inc., Red Hook, NY), 5979–5987.Google Scholar
Guha S, Munagala K, Shi P (2010) Approximation algorithms for restless bandit problems. J. ACM 58(1):1–50.Crossref, Google Scholar
Howard RA (1960) Dynamic Programming and Markov Processes (MIT Press, Boston).Google Scholar
Jin C, Allen-Zhu Z, Bubeck S, Jordan MI (2018) Is Q-learning provably efficient? Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (NIPS’18) (Curran Associates Inc., Red Hook, NY), 4868–4878.Google Scholar
Jung YH, Tewari A (2019) Regret bounds for Thompson sampling in episodic restless bandit problems. Adv. Neural Inform. Processing Systems, vol. 32.Google Scholar
Lei H, Tewari A, Murphy SA (2017) An actor-critic contextual bandit algorithm for personalized mobile health interventions. Preprint, submitted June 28, https://arxiv.org/abs/1706.09090.Google Scholar
Levine S, Kumar A, Tucker G, Fu J (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint, submitted May 4, https://arxiv.org/abs/2005.01643.Google Scholar
Li S, Wang B, Zhang S, Chen W (2016) Contextual combinatorial cascading bandits. Proc. 33rd Internat. Conf. Machine Learn. (ICML’16), vol. 48 (JMLR.org, New York), 1245–1253.Google Scholar
Liao P, Greenewald K, Klasnja P, Murphy S (2020) Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. Proc. ACM Interactive Mobile Wearable Ubiquitous Tech. 4(1):1–22.Crossref, Google Scholar
Liu K, Zhao Q (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
Mate A, Killian J, Xu H, Perrault A, Tambe M (2020) Collapsing bandits and their application to public health intervention. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. 34th Internat. Conf. Neural Inform. Processing Systems (NIPS’20) (Curran Associates Inc., Red Hook, NY).Google Scholar
Mate A, Madaan L, Taneja A, Madhiwalla N, Verma S, Singh G, Hegde A, Varakantham P, Tambe M (2022) Field study in deploying restless multi-armed bandits: Assisting non-profits in improving maternal and child health. Proc. AAAI Conf. Artificial Intelligence 36(11):12017–12025.Google Scholar
Meuleau N, Hauskrecht M, Kim K-E, Peshkin L, Kaelbling LP, Dean TL, Boutilier C (1998) Solving very large weakly coupled Markov decision processes. Proc. Fifteenth Natl./Tenth Conf. Artificial Intelligence/Innovative Appl. Artificial Intelligence (AAAI’98/IAAI’98) (American Association for Artificial Intelligence, Palo Alto, CA), 165–172.Google Scholar
Mills S (2022) Personalized nudging. Behav. Public Policy 6(1):150–159.Crossref, Google Scholar
Mintz Y, Aswani A, Kaminsky P, Flowers E, Fukuoka Y (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res. 68(5):1493–1516.Link, Google Scholar
Naeini MP, Cooper GF, Hauskrecht M (2015) Obtaining well-calibrated probabilities using Bayesian binning. Proc. Twenty-Ninth AAAI Conf. Artificial Intelligence (AAAI’15) (AAAI Press, Palo Alto, CA), 2901–2907.Google Scholar
Niño-Mora J (2020) A fast-pivoting algorithm for Whittle’s restless bandit index. Mathematics 8(12):2226.Crossref, Google Scholar
Papadimitriou CH, Tsitsiklis JN (1994) The complexity of optimal queueing network control. Proc. IEEE Ninth Annual Conf. Structure Complexity Theory (IEEE, Piscataway, NJ), 318–322.Google Scholar
Qin L, Chen S, Zhu X (2014) Contextual combinatorial bandit and its application on diversified online recommendation. Zaki M, Obradovic Z, Tan PN, Banerjee A, Kamath C, Parthasarathy S, eds. Proc. 2014 SIAM Internat. Conf. Data Mining (SDM) (SIAM, Philadelphia), 461–469.Google Scholar
Ruggeri K, Benzerga A, Verra S, Folke T (2023) A behavioral approach to personalizing public health. Behav. Public Policy 7(2):457–469.Crossref, Google Scholar
Schmittlein DC, Morrison DG, Colombo R (1987) Counting your customers: Who-are they and what will they do next? Management Sci. 33(1):1–24.Link, Google Scholar
Suen S-C, Bendavid E, Goldhaber-Fiebert JD (2014) Disease control implications of India’s changing multi-drug resistant tuberculosis epidemic. PLoS One 9(3):e89822.Crossref, Google Scholar
Suen S-C, Brandeau ML, Goldhaber-Fiebert JD (2018) Optimal timing of drug sensitivity testing for patients on first-line tuberculosis treatment. Health Care Management Sci. 21(4):632–646.Crossref, Google Scholar
Suen S-C, Negoescu D, Goh J (2022) Design of incentive programs for optimal medication adherence in the presence of observable consumption. Oper. Res. 70(3):1691–1716.Link, Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Szepesvári C (2022) Algorithms for Reinforcement Learning (Springer Nature, London).Google Scholar
Wang R, Foster DP, Kakade SM (2020a) What are the statistical limits of offline RL with linear function approximation? Preprint, submitted October 22, https://arxiv.org/abs/2010.11895.Google Scholar
Wang S, Huang L, Lui JCS (2020b) Restless-UCB, an efficient and low-complexity algorithm for online restless bandits. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. 34th Internat. Conf. Neural Inform. Processing Systems (NIPS’20) (Curran Associates Inc., Red Hook, NY), 11878–11889.Google Scholar
Weber RR, Weiss G (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
Whittle P (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Crossref, Google Scholar
World Health Organization (2022) Global Tuberculosis Report 2022 (World Health Organization, Geneva), xiii, 51 p.Google Scholar
Yoeli E, Rathauser J, Bhanot SP, Kimenye MK, Mailu E, Masini E, Owiti P, Rand D (2019) Digital health support in treatment for tuberculosis. New England J. Medicine 381(10):986–987.Crossref, Google Scholar

cover image Manufacturing & Service Operations Management

Volume 27, Issue 3

May-June 2025

Pages iv-xx, 679-992, C2

Article Information

Supplemental Material

Metrics

Information

Received:September 24, 2023
Accepted:October 18, 2024
Published Online:March 19, 2025

Cite as

Jackie Baek; , Justin J. Boutilier; , Vivek F. Farias, Jónas Oddur Jónasson, Erez Yoeli (2025) Policy Optimization for Personalized Interventions in Behavioral Health. Manufacturing & Service Operations Management 27(3):770-788.

https://doi.org/10.1287/msom.2023.0548

Keywords

Acknowledgments

The authors are grateful to Jon Rathauser, founder and CEO of Keheala, for the collaboration. The authors also thank the Keheala Operational Team (Alice, Faith, Edwin, Jacinta, Jill, Lewis, Moreen, Trish).

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Policy Optimization for Personalized Interventions in Behavioral Health

References

Volume 27, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News