Learning to Optimally Stop Diffusion Processes, with Financial Applications

Min Dai
Corresponding Author
Min Dai
[email protected]
https://orcid.org/0000-0002-8270-9413
Department of Applied Mathematics at the Faculty of Computer and Mathematical Sciences and School of Accounting and Finance at the Faculty of Business, The Hong Kong Polytechnic University, Hong Kong, China
Search for more papers by this author
,
Yu Sun
Corresponding Author
Yu Sun
[email protected]
https://orcid.org/0000-0001-9061-3713
Peking University HSBC Business School, Shenzhen 518055, China
Search for more papers by this author
,
Zuo Quan Xu
Zuo Quan Xu
[email protected]
https://orcid.org/0000-0001-6824-1634
Department of Applied Mathematics, Faculty of Computer and Mathematical Sciences, The Hong Kong Polytechnic University, Hong Kong, China
Search for more papers by this author
,
Xun Yu Zhou
Xun Yu Zhou
[email protected]
https://orcid.org/0000-0001-9908-5697
Department of Industrial Engineering and Operations Research and The Data Science Institute, Columbia University, New York, New York 10027
Search for more papers by this author

Corresponding Author

Min Dai

Department of Applied Mathematics at the Faculty of Computer and Mathematical Sciences and School of Accounting and Finance at the Faculty of Business, The Hong Kong Polytechnic University, Hong Kong, China

Search for more papers by this author

Yu Sun

Corresponding Author

Yu Sun

[email protected]

https://orcid.org/0000-0001-9061-3713

Peking University HSBC Business School, Shenzhen 518055, China

Search for more papers by this author

Zuo Quan Xu

[email protected]

https://orcid.org/0000-0001-6824-1634

Department of Applied Mathematics, Faculty of Computer and Mathematical Sciences, The Hong Kong Polytechnic University, Hong Kong, China

Search for more papers by this author

Xun Yu Zhou

[email protected]

https://orcid.org/0000-0001-9908-5697

Department of Industrial Engineering and Operations Research and The Data Science Institute, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:25 Feb 2026https://doi.org/10.1287/mnsc.2024.07614

References

Ajdari A, Niyazi M, Nicolay NH, Thieke C, Jeraj R, Bortfeld T (2019) Towards optimal stopping in radiation therapy. Radiotherapy Oncology 134:96–100.Crossref, Google Scholar
Barberis N (2012) A model of casino gambling. Management Sci. 58(1):35–51.Link, Google Scholar
Becker S, Cheridito P, Jentzen A (2019) Deep optimal stopping. J. Machine Learn. Res. 20(74):1–25.Google Scholar
Becker S, Cheridito P, Jentzen A (2020) Pricing and hedging American-style options with deep learning. J. Risk Financial Management 13(7):158.Crossref, Google Scholar
Becker S, Cheridito P, Jentzen A, Welti T (2021) Solving high-dimensional optimal stopping problems using deep learning. Eur. J. Appl. Math. 32(3):470–514.Crossref, Google Scholar
Dai M, Dong Y (2024) Learning an optimal investment policy with transaction costs via a randomized Dynkin game. Preprint, submitted June 20, https://doi.org/10.2139/ssrn.4871712.Google Scholar
Dai M, Dong Y, Jia Y (2023) Learning equilibrium mean-variance strategy. Math. Finance 33(4):1166–1212.Crossref, Google Scholar
Dai M, Kwok YK, You H (2007) Intensity-based framework and penalty formulation of optimal stopping problems. J. Econom. Dynam. Control 31(12):3860–3880.Crossref, Google Scholar
Dai M, Dong Y, Jia Y, Zhou XY (2023) Learning Merton’s strategies in an incomplete market: Recursive entropy regularization and biased Gaussian exploration. Preprint, submitted December 20, https://doi.org/10.2139/ssrn.4668480.Google Scholar
Dai M, Dong Y, Jia Y, Zhou XY (2025) Data-driven Merton’s strategies via policy randomization. Preprint, submitted May 8, https://arxiv.org/abs/2312.11797.Google Scholar
Dai Z, Yu H, Low BKH, Jaillet P (2019) Bayesian optimization meets Bayesian optimal stopping. Internat. Conf. Machine Learn. (PMLR), 1496–1506.Google Scholar
Dianetti J, Ferrari G, Xu R (2024) Exploratory optimal stopping: A singular control formulation. Preprint, submitted August 18, https://arxiv.org/abs/2408.09335.Google Scholar
Dong Y (2024) Randomized optimal stopping problem in continuous time and reinforcement learning algorithm. SIAM J. Control Optim. 62(3):1590–1614.Crossref, Google Scholar
Forsyth PA, Vetzal KR (2002) Quadratic convergence for valuing American options using a penalty method. SIAM J. Sci. Comput. 23(6):2095–2122.Crossref, Google Scholar
Friedman A (1982) Variational Principles and Free Boundary Problems (Wiley, New York).Google Scholar
Grigelionis BI, Shiryaev AN (1966) On Stefan’s problem and optimal stopping rules for Markov processes. Theory Probab. Appl. 11(4):541–558.Crossref, Google Scholar
Han J, Jentzen A, W E (2018) Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 115(34):8505–8510.Crossref, Google Scholar
He XD, Hu S, Obłój J, Zhou XY (2017) Path-dependent and randomized strategies in Barberis’ casino gambling model. Oper. Res. 65(1):97–103.Link, Google Scholar
Herrera C, Krach F, Ruyssen P, Teichmann J (2024) Optimal stopping via randomized neural networks. Frontiers Math. Finance 3(1):31–77.Crossref, Google Scholar
Hu S, Obłój J, Zhou XY (2023) A casino gambling model under cumulative prospect theory: Analysis and algorithm. Management Sci. 69(4):2474–2496.Link, Google Scholar
Jia Y, Zhou XY (2022a) Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. J. Machine Learn. Res. 23(154):1–55.Google Scholar
Jia Y, Zhou XY (2022b) Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. J. Machine Learn. Res. 23(275):1–50.Google Scholar
Jia Y, Zhou XY (2023) q-learning in continuous time. J. Machine Learn. Res. 24(161):1–61. Google Scholar
Jia Y, Ouyang D, Zhang Y (2025) Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning. Preprint, submitted March 13, https://arxiv.org/abs/2503.09981.Google Scholar
Jiang R, Saunders D, Weng C (2022) The reinforcement learning Kelly strategy. Quant. Finance 22(8):1445–1464.Crossref, Google Scholar
Liang J, Hu B, Jiang L, Bian B (2007) On the rate of convergence of the binomial tree scheme for American options. Numerische Mathematik 107(2):333–352.Crossref, Google Scholar
Liyanage YW, Zois DS, Chelmis C, Yao M (2019) Automating the classification of urban issue reports: An optimal stopping approach. IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP), 3137–3141.Google Scholar
Munos R (2006) Policy gradient in continuous time. J. Machine Learn. Res. 7(27):771–791.Google Scholar
Peng Y, Wei P, Wei W (2024) Deep penalty methods: A class of deep learning algorithms for solving high dimensional optimal stopping problems. Preprint, submitted May 18, https://doi.org/10.2139/ssrn.4839092.Google Scholar
Pham H (2009) Continuous-Time Stochastic Control and Optimization with Financial Applications (Springer, Berlin).Crossref, Google Scholar
Reppen AM, Soner HM, Tissot-Daguette V (2025) Neural optimal stopping boundary. Math. Finance 35(2):441–469.Crossref, Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Tallec C, Blier L, Ollivier Y (2019) Making deep q-learning methods robust to time discretization. Proc. 36th Internat. Conf. Machine Learn., vol. 97 (PMLR), 6096–6104.Google Scholar
Wang S, Perdikaris P (2021) Deep learning of free boundary and Stefan problems. J. Comput. Phys. 428:109914.Crossref, Google Scholar
Wang H, Zhou XY (2020) Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Math. Finance 30(4):1273–1308.Crossref, Google Scholar
Wang H, Zariphopoulou T, Zhou XY (2020) Reinforcement learning in continuous time and space: A stochastic control approach. J. Machine Learn. Res. 21(198):1–34.Google Scholar
Wu B, Li L (2024) Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market. J. Econom. Dynam. Control 158:104787.Crossref, Google Scholar
Yong J, Zhou XY (1999) Stochastic Controls: Hamiltonian Systems and HJB Equations (Springer, New York).Crossref, Google Scholar

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:September 09, 2024
Accepted:October 07, 2025
Published Online:February 25, 2026

Cite as

Min Dai, Yu Sun, Zuo Quan Xu, Xun Yu Zhou (2026) Learning to Optimally Stop Diffusion Processes, with Financial Applications. Management Science 0(0).

https://doi.org/10.1287/mnsc.2024.07614

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Learning to Optimally Stop Diffusion Processes, with Financial Applications

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News