Kernel-Based Distributed Q-Learning: A Scalable Reinforcement Learning Approach for Dynamic Treatment Regimes
References
- (2016) Adaptive interventions in child and adolescent mental health. J. Clinical Child Adolescent Psych. 45(4):383–395.Crossref, Google Scholar
- (2007) Optimal rates for the regularized least-squares algorithm. Foundations Comput. Math. 7(3):331–368.Crossref, Google Scholar
- (2013) Statistical Reinforcement Learning (Springer, Berlin).Google Scholar
- (2021) Risk bounds and Rademacher complexity in batch reinforcement learning. Meila M, Zhang T, eds. Proc. 38th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 139 (PMLR, New York), 2892–2902.Google Scholar
- (2003) Learning rates for q-learning. J. Machine Learn. Res. 5:1–25.Google Scholar
- Fan J, Wang Z, Xie Y, Yang Z (2019) A theoretical analysis of deep q-learning. Preprint, submitted January 1, https://arxiv.org/pdf/1901.00137.Google Scholar
- (2024) A survey on offline reinforcement learning: Taxonomy, review, and open problems. IEEE Trans. Neural Networks Learn. Systems 35(8):10237–10257.Crossref, Google Scholar
- (2018) An introduction to deep reinforcement learning. Foundations Trends Machine Learn. 11(3–4):219–354.Crossref, Google Scholar
- (2018) Addressing function approximation error in actor-critic methods. Dy J, Krause A, eds. Proc. 35th Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 80 (PMLR, New York), 1587–1596.Google Scholar
- (2010) Understanding the difficulty of training deep feedforward neural networks. Teh YW, Titterington M, eds. Proc. 13th Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 9 (PLMR, New York), 249–356.Google Scholar
- (2012) Q-learning with censored data. Ann. Statist. 40(1):529–560.Crossref, Google Scholar
- (2016) Deep Learning (MIT Press, Cambridge, MA).Google Scholar
- (2009) Reinforcement learning: A tutorial survey and recent advances. INFORMS J. Comput. 21(2):178–192.Link, Google Scholar
- (2006) A Distribution-Free Theory of Nonparametric Regression (Springer Science & Business Media, Boston).Google Scholar
- (2018) Soft actor-critic algorithms and applications. Preprint, submitted December 13, https://arxiv.org/abs/1812.05905.Google Scholar
- (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. Proc. IEEE Internat. Conf. Comput. Vision (IEEE Computer Society, Washington, DC), 1026–1034.Google Scholar
- (2017) Using reinforcement learning to personalize dosing strategies in a simulated cancer trial with high dimensional data. MS thesis, University of Arizona, Tucson.Google Scholar
- (2000) Olanzapine optimal dose: Results of an open-label multicenter study in schizophrenic patients. Psychiatry Clin. Neurosci. 54(4):467–478.Crossref, Google Scholar
- (2019) When to trust your model: Model-based policy optimization. Wallch HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R, eds. Proc. 33rd Internat. Conf. Neural Inform. Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY), 12519–12530.Google Scholar
- (1998) Finite-sample convergence rates for q-learning and indirect algorithms. Kearns MJ, Solla SA, Cohn DA, eds. Proc. 11th Internat. Conf. Neural Inform. Processing Systems, vol. 11 (MIT Press, Cambridge, MA), 996–1002.Google Scholar
- (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint, submitted May 4, https://arxiv.org/abs/2005.01643.Google Scholar
- (2016) Continuous control with deep reinforcement learning. Bengio Y, LeCun Y, eds. Proc. 4th Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar
- (2018) Distributed kernel-based gradient descent algorithms. Constructive Approximation 47(2):249–276.Crossref, Google Scholar
- (2017) Distributed learning with regularized least squares. J. Machine Learn. Res. 18(92):1–31.Google Scholar
- (2020) Distributed kernel ridge regression with communications. J. Machine Learn. Res. 21(93):1–38.Google Scholar
- (2022) Provably efficient kernelized q-learning. Preprint, submitted April 21, https://arxiv.org/abs/2204.10349.Google Scholar
- (2016) Optimal learning rates for localized SVMs. J. Machine Learn. Res. 17(194):1–44.Google Scholar
- (2016) Asynchronous methods for deep reinforcement learning. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 48 (PMLR, New York), 1928–1937.Google Scholar
- (2005a) An experimental design for the development of adaptive treatment strategies. Statist. Medicine 24(10):1455–1481.Crossref, Google Scholar
- (2005b) A generalization error for q-learning. J. Machine Learn. Res. 6(37):1073–1097.Google Scholar
- (2022) Generalization error bounds of dynamic treatment regimes in penalized regression-based learning. Ann. Statist. 50(4):2047–2071.Crossref, Google Scholar
- (2015) Distributed deep q-learning. Preprint, submitted August 18, https://arxiv.org/abs/1508.04186.Google Scholar
- (2022) A deep q-network for the beer game: Deep reinforcement learning for inventory optimization. Manufacturing Service Oper. Management 24(1):285–304.Link, Google Scholar
- (2017) Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment. Math. Biosci. 293(3):11–20.Crossref, Google Scholar
- (1994) Optimum bounds for the distributions of martingales in Banach spaces. Ann. Probability 22(4):1679–1706.Crossref, Google Scholar
- (2017) Continuous state-space models for optimal sepsis treatment: A deep reinforcement learning approach. Doshi-Velez F, Fackler J, Kale D, Ranganath R, Wallace B, Wiens J, eds. Proc. Machine Learn. for Healthcare Conf., Proceedings of Machine Learning Research, vol. 68 (PMLR, New York), 147–163.Google Scholar
- (2015) Less is more: Nyström computational regularization. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Proc. 28th Internat. Conf. Neural Inform. Processing Systems, vol. 28 (Curran Associates, Inc., Red Hook, NY), 1657–1665.Google Scholar
- (2017) Proximal policy optimization algorithms. Preprint, submitted July 20, https://arxiv.org/abs/1707.06347.Google Scholar
- (2007) Duration of first-line chemotherapy in advanced non small-cell lung cancer: Less is more in the era of effective subsequent therapies. J. Clinical Oncology 25(33):5155–5157.Crossref, Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (1999) Policy gradient methods for reinforcement learning with function approximation. Solla SA, Leen TK, Müller KB, eds. Proc. 12th Internat. Conf. Neural Inform. Processing Systems, vol 12 (MIT Press, Cambridge, MA), 1057–1063.Google Scholar
- (2017) Deep reinforcement learning for automated radiation adaptation in lung cancer. Medical Phys. 44(12):6690–6705.Crossref, Google Scholar
- (2019) Dynamic Treatment Regimes: Statistical Methods for Precision Medicine (Chapman and Hall/CRC, Boca Raton, FL).Crossref, Google Scholar
- (2019) Variance-reduced q-learning is minimax optimal. Preprint, submitted June 11, https://arxiv.org/abs/1906.04697.Google Scholar
- (2020) What are the statistical limits of offline RL with linear function approximation? Preprint, submitted October 22, https://arxiv.org/abs/2010.11895.Google Scholar
- (1992) Q-learning. Mach. Learn. 8(3–4):279–292.Google Scholar
- (2007) Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans. Neural Networks 18(4):973–992.Crossref, Google Scholar
- (2021) Reinforcement learning in healthcare: A survey. ACM Comput. Survey 55(1):1–36.Crossref, Google Scholar
- (2015) Divide and conquer kernel ridge regression: A distributed algorithm with minimax optimal rates. J. Machine Learn. Res. 16(102):3299–3340.Google Scholar
- (2009) Reinforcement learning design for cancer clinical trials. Statist. Medicine 28(26):3294–3315.Crossref, Google Scholar

