Learning to Optimally Stop Diffusion Processes, with Financial Applications
Published Online:25 Feb 2026https://doi.org/10.1287/mnsc.2024.07614
References
- (2019) Towards optimal stopping in radiation therapy. Radiotherapy Oncology 134:96–100.Crossref, Google Scholar
- (2012) A model of casino gambling. Management Sci. 58(1):35–51.Link, Google Scholar
- (2019) Deep optimal stopping. J. Machine Learn. Res. 20(74):1–25.Google Scholar
- (2020) Pricing and hedging American-style options with deep learning. J. Risk Financial Management 13(7):158.Crossref, Google Scholar
- (2021) Solving high-dimensional optimal stopping problems using deep learning. Eur. J. Appl. Math. 32(3):470–514.Crossref, Google Scholar
- (2024) Learning an optimal investment policy with transaction costs via a randomized Dynkin game. Preprint, submitted June 20, https://doi.org/10.2139/ssrn.4871712.Google Scholar
- (2023) Learning equilibrium mean-variance strategy. Math. Finance 33(4):1166–1212.Crossref, Google Scholar
- (2007) Intensity-based framework and penalty formulation of optimal stopping problems. J. Econom. Dynam. Control 31(12):3860–3880.Crossref, Google Scholar
- (2023) Learning Merton’s strategies in an incomplete market: Recursive entropy regularization and biased Gaussian exploration. Preprint, submitted December 20, https://doi.org/10.2139/ssrn.4668480.Google Scholar
- (2025) Data-driven Merton’s strategies via policy randomization. Preprint, submitted May 8, https://arxiv.org/abs/2312.11797.Google Scholar
- (2019) Bayesian optimization meets Bayesian optimal stopping. Internat. Conf. Machine Learn. (PMLR), 1496–1506.Google Scholar
- (2024) Exploratory optimal stopping: A singular control formulation. Preprint, submitted August 18, https://arxiv.org/abs/2408.09335.Google Scholar
- (2024) Randomized optimal stopping problem in continuous time and reinforcement learning algorithm. SIAM J. Control Optim. 62(3):1590–1614.Crossref, Google Scholar
- (2002) Quadratic convergence for valuing American options using a penalty method. SIAM J. Sci. Comput. 23(6):2095–2122.Crossref, Google Scholar
- (1982) Variational Principles and Free Boundary Problems (Wiley, New York).Google Scholar
- (1966) On Stefan’s problem and optimal stopping rules for Markov processes. Theory Probab. Appl. 11(4):541–558.Crossref, Google Scholar
- (2018) Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 115(34):8505–8510.Crossref, Google Scholar
- (2017) Path-dependent and randomized strategies in Barberis’ casino gambling model. Oper. Res. 65(1):97–103.Link, Google Scholar
- (2024) Optimal stopping via randomized neural networks. Frontiers Math. Finance 3(1):31–77.Crossref, Google Scholar
- (2023) A casino gambling model under cumulative prospect theory: Analysis and algorithm. Management Sci. 69(4):2474–2496.Link, Google Scholar
- (2022a) Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. J. Machine Learn. Res. 23(154):1–55.Google Scholar
- (2022b) Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. J. Machine Learn. Res. 23(275):1–50.Google Scholar
- (2023) q-learning in continuous time. J. Machine Learn. Res. 24(161):1–61. Google Scholar
- (2025) Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning. Preprint, submitted March 13, https://arxiv.org/abs/2503.09981.Google Scholar
- (2022) The reinforcement learning Kelly strategy. Quant. Finance 22(8):1445–1464.Crossref, Google Scholar
- (2007) On the rate of convergence of the binomial tree scheme for American options. Numerische Mathematik 107(2):333–352.Crossref, Google Scholar
- (2019) Automating the classification of urban issue reports: An optimal stopping approach. IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP), 3137–3141.Google Scholar
- (2006) Policy gradient in continuous time. J. Machine Learn. Res. 7(27):771–791.Google Scholar
- (2024) Deep penalty methods: A class of deep learning algorithms for solving high dimensional optimal stopping problems. Preprint, submitted May 18, https://doi.org/10.2139/ssrn.4839092.Google Scholar
- (2009) Continuous-Time Stochastic Control and Optimization with Financial Applications (Springer, Berlin).Crossref, Google Scholar
- (2025) Neural optimal stopping boundary. Math. Finance 35(2):441–469.Crossref, Google Scholar
- (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
- (2019) Making deep q-learning methods robust to time discretization. Proc. 36th Internat. Conf. Machine Learn., vol. 97 (PMLR), 6096–6104.Google Scholar
- (2021) Deep learning of free boundary and Stefan problems. J. Comput. Phys. 428:109914.Crossref, Google Scholar
- (2020) Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Math. Finance 30(4):1273–1308.Crossref, Google Scholar
- (2020) Reinforcement learning in continuous time and space: A stochastic control approach. J. Machine Learn. Res. 21(198):1–34.Google Scholar
- (2024) Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market. J. Econom. Dynam. Control 158:104787.Crossref, Google Scholar
- (1999) Stochastic Controls: Hamiltonian Systems and HJB Equations (Springer, New York).Crossref, Google Scholar

