Learning the Minimal Representation of a Continuous State-Space Markov Decision Process from Transition Data
References
- (2004) The optimal timing of living-donor liver transplantation. Management Sci. 50(10):1420–1430.Link, Google Scholar
- (2016) Reinforcement learning in rich-observation MDPs using spectral methods. Preprint, submitted November 11, https://arxiv.org/abs/1611.03907v4.Google Scholar
- (1995) Residual algorithms: Reinforcement learning with function approximation. Prieditis A, Russell S, eds. Machine Learn. Proc. (Morgan Kaufmann, San Francisco), 30–37.Google Scholar
- (2023) Covid-19: Prediction, prevalence, and the operations of vaccine allocation. Manufacturing Service Oper. Management 25(3):1013–1032.Link, Google Scholar
- (1995) Dynamic Programming and Optimal Control, vol. 1 (Athena Scientific, Belmont, MA).Google Scholar
- (1989) Learnability and the Vapnik-Chervonenkis dimension. J. ACM 36(4):929–965.Crossref, Google Scholar
- (2002) R-max—A general polynomial time algorithm for near-optimal reinforcement learning. J. Machine Learn. Res. 3(October):213–231.Google Scholar
- (2020) Reinforcement learning for intelligent healthcare applications: A survey. Artificial Intelligence Medicine 109:101964.Crossref, Google Scholar
- (2018) On oracle-efficient PAC RL with rich observations. Proc. 32nd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1422–1432.Google Scholar
- (2019) Provably efficient RL with rich observations via latent state decoding. Internat. Conf. Machine Learn. (PMLR, New York), 1665–1674.Google Scholar
- (2017) UCI machine learning repository. Accessed September 1, 2024, http://archive.ics.uci.edu/ml.Google Scholar
- (2021) Reinforcement learning for precision oncology. Cancers (Basel) 13(18):4624.Crossref, Google Scholar
- (2006) Clinical data based optimal STI strategies for HIV: A reinforcement learning approach. Proc. 45th IEEE Conf. Decision Control (IEEE, Piscataway, NJ), 667–672.Google Scholar
- (2020) Provably efficient exploration for reinforcement learning using unsupervised learning. Advances in Neural Information Processing Systems, vol. 33 (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2003) Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence 147(1–2):163–223.Crossref, Google Scholar
- (2016) The optimal sample complexity of PAC learning. J. Machine Learn. Res. 17(1):1319–1333.Google Scholar
- (1985) Algebraic laws for nondeterminism and concurrency. J. ACM 32(1):137–161.Crossref, Google Scholar
- (2003) Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71(4):1161–1189.Crossref, Google Scholar
- (2010) Near-optimal regret bounds for reinforcement learning. J. Machine Learn. Res. 11(4):1563–1600.Google Scholar
- (2023) Nearly optimal latent state decoding in block MDPs. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2805–2904.Google Scholar
- (2020) Provably efficient reinforcement learning with linear function approximation. Conf. Learn. Theory (PMLR, New York), 2137–2143.Google Scholar
- (2016) The Malmo platform for artificial intelligence experimentation. IJCAI’16 Proc. 25th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 4246–4247.Google Scholar
- (2002) Near-optimal reinforcement learning in polynomial time. Machine Learn. 49(2):209–232.Crossref, Google Scholar
- (2016) PAC reinforcement learning with rich observations. Preprint, submitted February 8, https://arxiv.org/abs/1602.02722.Google Scholar
- (2023) An interpretable robust framework for sepsis treatment with limited resources. MSOM Conf.Google Scholar
- (2023) Is separately modeling subpopulations beneficial for sequential decision-making? Oper. Res., ePub ahead of print May 18, https://doi.org/10.1287/opre.2023.2474.Link, Google Scholar
- (2020) Offline reinforcement learning: Tutorial, review, and perspectives on open problems. Preprint, submitted May 4, https://arxiv.org/abs/2005.01643.Google Scholar
- (2011) Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms. Proc. Fourth ACM Internat. Conf. Web Search Data Mining (Association for Computing Machinery, New York), 297–306.Google Scholar
- (2014) Offline policy evaluation across representations with applications to educational games. Proc. 2014 Internat. Conf. Autonomous Agents Multi-Agent Systems (Paris), 1077–1084.Google Scholar
- (2020) Kinematic state abstraction and provably efficient rich-observation reinforcement learning. Internat. Conf. Machine Learn. (PMLR, New York), 6961–6971.Google Scholar
- (2016) Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. 2016 38th Annual Internat. Conf. IEEE Engrg. Medicine Biol. Soc. (EMBC) (IEEE, Piscataway, NJ), 2978–2981.Google Scholar
- (2018) Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. AMIA Annual Sympos. Proc. 2018:887–896.Google Scholar
- (2017) Continuous state-space models for optimal sepsis treatment-a deep reinforcement learning approach. Doshi-Velez F, Fackler J, Kale D, Ranganath R, Wallace B, Wiens J, eds. Proc. 2nd Machine Learn. Healthcare Conf., vol. 68 (PMLR, New York), 147–163.Google Scholar
- (2021) Challenges for reinforcement learning in healthcare. Preprint, submitted March 9, https://arxiv.org/abs/2103.05612.Google Scholar
- (2005) Clustering methods. Maimon O, Rokach L, eds. Data Mining and Knowledge Discovery Handbook (Springer, Boston), 321–352.Crossref, Google Scholar
- (2020) Approximation benefits of policy gradient methods with aggregated states. Management Sci. 69(11):6898–6911.Google Scholar
- (2019) Adaptive discretization for episodic reinforcement learning in metric spaces. Proc. ACM Measurement Anal. Comput. Systems (Association for Computing Machinery, New York), 1–44.Google Scholar
- (2023) Adaptive discretization in online reinforcement learning. Oper. Res. 71(5):1636–1652.Link, Google Scholar
- (2022) Shapley meets uniform: An axiomatic framework for attribution in online advertising. Management Sci. 68(10):7457–7479.Link, Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2006) Performance loss bounds for approximate value iteration with state aggregation. Math. Oper. Res. 31(2):234–244.Link, Google Scholar
- (1998) Statistical Learning Theory (John Wiley & Sons, New York).Google Scholar
- (2019) Complete statistical theory of learning. Automation Remote Control 80(11):1949–1975.Crossref, Google Scholar
- (2013) Efficient exploration and value function generalization in deterministic systems. Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 3021–3029.Google Scholar
- (2022) Reinforcement learning strategies in cancer chemotherapy treatment: A review. Comput. Methods Programs Biomedicine 229:107280.Crossref, Google Scholar
- (2017) Robust Markov decision processes for medical treatment decisions. Optimization Online (September 21), https://optimization-online.org/?p=13654.Google Scholar
- (2021) Learning robust state abstractions for hidden-parameter block MDPs. Internat. Conf. Learn. Representations (OpenReview.net).Google Scholar

