Policy Optimization for Personalized Interventions in Behavioral Health
References
- (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel R, Bartlett P, Pereira F, Weinberger KQ, eds. Proc. 25th Internat. Conf. Neural Inform. Processing Systems (NIPS’11), vol. 25 (Curran Associates Inc., Red Hook, NY), 2312–2320.Google Scholar
- (2008) Relaxations of weakly coupled stochastic dynamic programs. Oper. Res. 56(3):712–727.Link, Google Scholar
- (2013) Thompson sampling for contextual bandits with linear payoffs. Dasgupta S, McAllester D, eds. Proc. 30th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 28, no. 3 (PMLR, New York), 127–135.Google Scholar
- (2003) Whittle’s index policy for a multi-class queueing system with convex holding costs. Math. Methods Oper. Res. 57:21–39.Crossref, Google Scholar
- (2019) Behavioral modeling in weight loss interventions. Eur. J. Oper. Res. 272(3):1058–1072.Crossref, Google Scholar
- (2022) Whittle index based q-learning for restless bandits with average reward. Automatica 139:110186.Crossref, Google Scholar
- (2021) Learn to intervene: An adaptive learning policy for restless bandits in application to preventive healthcare. Preprint, submitted May 17, https://arxiv.org/abs/2105.07965.Google Scholar
- (2011) Medication adherence: A call for action. Amer. Heart J. 162(3):412–424.Crossref, Google Scholar
- (2022) Improving tuberculosis treatment adherence support: The case for targeted behavioral interventions. Manufacturing Service Oper. Management 24(6):2925–2943.Link, Google Scholar
- (2021) Offline RL without off-policy evaluation. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Proc. 35th Internat. Conf. Neural Inform. Processing Systems (NIPS’21), vol. 35 (Curran Associates Inc., Red Hook, NY), 1–14.Google Scholar
- (2018) Double/debiased machine learning for treatment and structural parameters. Econometrics J. 21(1):C1–C68.Google Scholar
- (2023) Optimal hospital care scheduling during the SARS-COV-2 pandemic. Management Sci. 69(10):5923–5947.Link, Google Scholar
- (2019) Towards Q-learning the Whittle index for restless bandits. 2019 Australian New Zealand Control Conf. (IEEE, Piscataway, NJ), 249–254.Google Scholar
- (2019) Synchronous and asynchronous video observed therapy (VOT) for tuberculosis treatment adherence monitoring and support. J. Clinical Tuberculosis Other Mycobacterial Diseases 17:100098.Crossref, Google Scholar
- (1960) Capacity of a burst-noise channel. Bell System Tech. J. 39(5):1253–1265.Crossref, Google Scholar
- (2002) An index policy for a stochastic scheduling model with improving/deteriorating jobs. Naval Res. Logist. 49(7):706–721.Crossref, Google Scholar
- (2006) Some indexable families of restless bandit problems. Adv. Appl. Probab. 38(3):643–672.Crossref, Google Scholar
- (2021) Online assortment optimization with reusable resources. Management Sci. 68(7):4772–4785.Link, Google Scholar
- (2017) Action centered contextual bandits. Guyon I, Von Luxburg U, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proc. 30th Internat. Conf. Neural Inform. Processing Systems (NIPS’17) (Curran Associates Inc., Red Hook, NY), 5979–5987.Google Scholar
- (2010) Approximation algorithms for restless bandit problems. J. ACM 58(1):1–50.Crossref, Google Scholar
- (1960) Dynamic Programming and Markov Processes (MIT Press, Boston).Google Scholar
- (2018) Is Q-learning provably efficient? Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, eds. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (NIPS’18) (Curran Associates Inc., Red Hook, NY), 4868–4878.Google Scholar
- (2019) Regret bounds for Thompson sampling in episodic restless bandit problems. Adv. Neural Inform. Processing Systems, vol. 32.Google Scholar
- (2017) An actor-critic contextual bandit algorithm for personalized mobile health interventions. Preprint, submitted June 28, https://arxiv.org/abs/1706.09090.Google Scholar
- (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint, submitted May 4, https://arxiv.org/abs/2005.01643.Google Scholar
- (2016) Contextual combinatorial cascading bandits. Proc. 33rd Internat. Conf. Machine Learn. (ICML’16), vol. 48 (JMLR.org, New York), 1245–1253.Google Scholar
- (2020) Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity. Proc. ACM Interactive Mobile Wearable Ubiquitous Tech. 4(1):1–22.Crossref, Google Scholar
- (2010) Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inform. Theory 56(11):5547–5567.Crossref, Google Scholar
- (2020) Collapsing bandits and their application to public health intervention. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. 34th Internat. Conf. Neural Inform. Processing Systems (NIPS’20) (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2022) Field study in deploying restless multi-armed bandits: Assisting non-profits in improving maternal and child health. Proc. AAAI Conf. Artificial Intelligence 36(11):12017–12025.Google Scholar
- (1998) Solving very large weakly coupled Markov decision processes. Proc. Fifteenth Natl./Tenth Conf. Artificial Intelligence/Innovative Appl. Artificial Intelligence (AAAI’98/IAAI’98) (American Association for Artificial Intelligence, Palo Alto, CA), 165–172.Google Scholar
- (2022) Personalized nudging. Behav. Public Policy 6(1):150–159.Crossref, Google Scholar
- (2020) Nonstationary bandits with habituation and recovery dynamics. Oper. Res. 68(5):1493–1516.Link, Google Scholar
- Naeini MP, Cooper GF, Hauskrecht M (2015) Obtaining well-calibrated probabilities using Bayesian binning. Proc. Twenty-Ninth AAAI Conf. Artificial Intelligence (AAAI’15) (AAAI Press, Palo Alto, CA), 2901–2907.Google Scholar
- (2020) A fast-pivoting algorithm for Whittle’s restless bandit index. Mathematics 8(12):2226.Crossref, Google Scholar
- (1994) The complexity of optimal queueing network control. Proc. IEEE Ninth Annual Conf. Structure Complexity Theory (IEEE, Piscataway, NJ), 318–322.Google Scholar
- (2014) Contextual combinatorial bandit and its application on diversified online recommendation. Zaki M, Obradovic Z, Tan PN, Banerjee A, Kamath C, Parthasarathy S, eds. Proc. 2014 SIAM Internat. Conf. Data Mining (SDM) (SIAM, Philadelphia), 461–469.Google Scholar
- (2023) A behavioral approach to personalizing public health. Behav. Public Policy 7(2):457–469.Crossref, Google Scholar
- (1987) Counting your customers: Who-are they and what will they do next? Management Sci. 33(1):1–24.Link, Google Scholar
- (2014) Disease control implications of India’s changing multi-drug resistant tuberculosis epidemic. PLoS One 9(3):e89822.Crossref, Google Scholar
- (2018) Optimal timing of drug sensitivity testing for patients on first-line tuberculosis treatment. Health Care Management Sci. 21(4):632–646.Crossref, Google Scholar
- (2022) Design of incentive programs for optimal medication adherence in the presence of observable consumption. Oper. Res. 70(3):1691–1716.Link, Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2022) Algorithms for Reinforcement Learning (Springer Nature, London).Google Scholar
- (2020a) What are the statistical limits of offline RL with linear function approximation? Preprint, submitted October 22, https://arxiv.org/abs/2010.11895.Google Scholar
- (2020b) Restless-UCB, an efficient and low-complexity algorithm for online restless bandits. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Proc. 34th Internat. Conf. Neural Inform. Processing Systems (NIPS’20) (Curran Associates Inc., Red Hook, NY), 11878–11889.Google Scholar
- (1990) On an index policy for restless bandits. J. Appl. Probab. 27(3):637–648.Crossref, Google Scholar
- (1988) Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25(A):287–298.Crossref, Google Scholar
- World Health Organization (2022) Global Tuberculosis Report 2022 (World Health Organization, Geneva), xiii, 51 p.Google Scholar
- (2019) Digital health support in treatment for tuberculosis. New England J. Medicine 381(10):986–987.Crossref, Google Scholar

