Learning to Optimally Stop Diffusion Processes, with Financial Applications

Min Dai
Corresponding Author
Min Dai
[email protected]
https://orcid.org/0000-0002-8270-9413
Department of Applied Mathematics at the Faculty of Computer and Mathematical Sciences and School of Accounting and Finance at the Faculty of Business, The Hong Kong Polytechnic University, Hong Kong, China
Search for more papers by this author
,
Yu Sun
Corresponding Author
Yu Sun
[email protected]
https://orcid.org/0000-0001-9061-3713
Peking University HSBC Business School, Shenzhen 518055, China
Search for more papers by this author
,
Zuo Quan Xu
Zuo Quan Xu
[email protected]
https://orcid.org/0000-0001-6824-1634
Department of Applied Mathematics, Faculty of Computer and Mathematical Sciences, The Hong Kong Polytechnic University, Hong Kong, China
Search for more papers by this author
,
Xun Yu Zhou
Xun Yu Zhou
[email protected]
https://orcid.org/0000-0001-9908-5697
Department of Industrial Engineering and Operations Research and The Data Science Institute, Columbia University, New York, New York 10027
Search for more papers by this author

Corresponding Author

Min Dai

Department of Applied Mathematics at the Faculty of Computer and Mathematical Sciences and School of Accounting and Finance at the Faculty of Business, The Hong Kong Polytechnic University, Hong Kong, China

Search for more papers by this author

Yu Sun

Corresponding Author

Yu Sun

[email protected]

https://orcid.org/0000-0001-9061-3713

Peking University HSBC Business School, Shenzhen 518055, China

Search for more papers by this author

Zuo Quan Xu

[email protected]

https://orcid.org/0000-0001-6824-1634

Department of Applied Mathematics, Faculty of Computer and Mathematical Sciences, The Hong Kong Polytechnic University, Hong Kong, China

Search for more papers by this author

Xun Yu Zhou

[email protected]

https://orcid.org/0000-0001-9908-5697

Department of Industrial Engineering and Operations Research and The Data Science Institute, Columbia University, New York, New York 10027

Search for more papers by this author

Published Online:25 Feb 2026https://doi.org/10.1287/mnsc.2024.07614

Abstract

We study optimal stopping for diffusion processes with unknown model primitives within the continuous-time reinforcement learning (RL) framework Wang, Zariphopoulou and Zhou [Wang H, Zariphopoulou T, Zhou XY (2020), J. Machine Learn. Res. 21(198):1-34], and present applications to option pricing and portfolio choice. By penalizing the corresponding variational inequality formulation, we transform the stopping problem into a stochastic optimal control problem with two actions. We then randomize controls into Bernoulli distributions and add an entropy regularizer to encourage exploration. We derive a semianalytical optimal Bernoulli distribution, based on which we devise RL algorithms using the martingale approach Jia and Zhou [Jia Y, Zhou XY (2022a) J. Machine Learn. Res. 23(154):1–55]. We establish a policy improvement theorem and prove the fast convergence of the resulting policy iterations. We demonstrate the effectiveness of the algorithms in pricing finite-horizon American put options, solving Merton’s problem with transaction costs, and scaling to high-dimensional optimal stopping problems. In particular, we show that both the offline and online algorithms achieve high accuracy in learning the value functions and characterizing the associated free boundaries.

This paper has been accepted by Kay Giesecke for the Special Issue on AI for Finance and Business Decisions.

Funding: M. Dai was supported by the Hong Kong Research Grants Council [15212324, 15217123, 15213422, and T32-615/24-R], the Hong Kong Polytechnic University [P0042708, P0042456, and P0039114], and the Key Project of National Natural Science Foundation of China [72432005]. Y. Sun acknowledges financial support from a start-up fund at Peking University HSBC Business School. Z. Xu acknowledges financial support from The Hong Kong RGC [GRF 15204622, 15203423], NSFC 12571517, the PolyU-SDU Joint Research Center on Financial Mathematics, the CAS AMSS-PolyU Joint Laboratory of Applied Mathematics, the Research Centre for Quantitative Finance (1-CE03), and internal grants from The Hong Kong Polytechnic University. X. Zhou was supported by the Nie Center for Intelligent Asset Management at Columbia University. This work was also part of a Columbia-CityU/HK collaborative project that was supported by the InnoHK Initiative, the Government of the HKSAR, and the AIFT Lab.

Supplemental Material: The online appendix and data files are available at https://doi.org/10.1287/mnsc.2024.07614.

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:September 09, 2024
Accepted:October 07, 2025
Published Online:February 25, 2026

Cite as

Min Dai, Yu Sun, Zuo Quan Xu, Xun Yu Zhou (2026) Learning to Optimally Stop Diffusion Processes, with Financial Applications. Management Science 0(0).

https://doi.org/10.1287/mnsc.2024.07614

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Learning to Optimally Stop Diffusion Processes, with Financial Applications

Abstract

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News