Published Online:https://doi.org/10.1287/educ.2025.0293

Abstract

As a paradigm for sequential decision making in unknown environments, reinforcement learning (RL) has received a flurry of attention in recent years. However, the explosion of model complexity in emerging applications and the presence of nonconvexity exacerbate the challenge of achieving efficient RL in sample-starved situations, where data collection is expensive, time-consuming, or even high stakes (e.g., in clinical trials, autonomous systems, and online advertising). How to understand and enhance the sample and computational efficacies of RL algorithms is thus of great interest. In this tutorial, we aim to introduce several important algorithmic and theoretical developments in RL, highlighting the connections between new ideas and classical topics. Employing Markov decision processes as the central mathematical model, we cover several distinctive RL scenarios (i.e., RL with a simulator, online RL, offline RL, robust RL, and RL with human feedback) and present several mainstream RL approaches (i.e., model-based approach, value-based approach, and policy optimization). Our discussions gravitate around the issues of sample complexity, computational efficiency, and algorithm-dependent and information-theoretic lower bounds from a nonasymptotic viewpoint.

Funding: Y. Chi was supported in part by the National Science Foundation (NSF) [Grants CCF-2106778 and DMS-2134080], Air Force Research Laboratory [Grant FA8750-20-2-050], and Office of Naval Research (ONR) [Grant N00014-19-1-2404]. Y. Chen was supported in part by the Alfred P. Sloan Research Fellowship, the Google Research Scholar Award, the Air Force Office of Scientific Research [Grants FA9550-19-1-0030 and FA9550-22-1-0198], the ONR [Grant N00014-22-1-2354], and the NSF [Grants CCF-2221009 and CCF-1907661]. Y. Wei was supported in part by the Google Research Scholar Award and the NSF [Grants CCF-2106778 and DMS-2147546/2015447 and CAREER Award DMS-2143215].

Your Access Options

View full text | Download PDF
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.