Learning to Optimally Stop Diffusion Processes, with Financial Applications

Published Online:https://doi.org/10.1287/mnsc.2024.07614

References

  • Ajdari A, Niyazi M, Nicolay NH, Thieke C, Jeraj R, Bortfeld T (2019) Towards optimal stopping in radiation therapy. Radiotherapy Oncology 134:96–100.CrossrefGoogle Scholar
  • Barberis N (2012) A model of casino gambling. Management Sci. 58(1):35–51.LinkGoogle Scholar
  • Becker S, Cheridito P, Jentzen A (2019) Deep optimal stopping. J. Machine Learn. Res. 20(74):1–25.Google Scholar
  • Becker S, Cheridito P, Jentzen A (2020) Pricing and hedging American-style options with deep learning. J. Risk Financial Management 13(7):158.CrossrefGoogle Scholar
  • Becker S, Cheridito P, Jentzen A, Welti T (2021) Solving high-dimensional optimal stopping problems using deep learning. Eur. J. Appl. Math. 32(3):470–514.CrossrefGoogle Scholar
  • Dai M, Dong Y (2024) Learning an optimal investment policy with transaction costs via a randomized Dynkin game. Preprint, submitted June 20, https://doi.org/10.2139/ssrn.4871712.Google Scholar
  • Dai M, Dong Y, Jia Y (2023) Learning equilibrium mean-variance strategy. Math. Finance 33(4):1166–1212.CrossrefGoogle Scholar
  • Dai M, Kwok YK, You H (2007) Intensity-based framework and penalty formulation of optimal stopping problems. J. Econom. Dynam. Control 31(12):3860–3880.CrossrefGoogle Scholar
  • Dai M, Dong Y, Jia Y, Zhou XY (2023) Learning Merton’s strategies in an incomplete market: Recursive entropy regularization and biased Gaussian exploration. Preprint, submitted December 20, https://doi.org/10.2139/ssrn.4668480.Google Scholar
  • Dai M, Dong Y, Jia Y, Zhou XY (2025) Data-driven Merton’s strategies via policy randomization. Preprint, submitted May 8, https://arxiv.org/abs/2312.11797.Google Scholar
  • Dai Z, Yu H, Low BKH, Jaillet P (2019) Bayesian optimization meets Bayesian optimal stopping. Internat. Conf. Machine Learn. (PMLR), 1496–1506.Google Scholar
  • Dianetti J, Ferrari G, Xu R (2024) Exploratory optimal stopping: A singular control formulation. Preprint, submitted August 18, https://arxiv.org/abs/2408.09335.Google Scholar
  • Dong Y (2024) Randomized optimal stopping problem in continuous time and reinforcement learning algorithm. SIAM J. Control Optim. 62(3):1590–1614.CrossrefGoogle Scholar
  • Forsyth PA, Vetzal KR (2002) Quadratic convergence for valuing American options using a penalty method. SIAM J. Sci. Comput. 23(6):2095–2122.CrossrefGoogle Scholar
  • Friedman A (1982) Variational Principles and Free Boundary Problems (Wiley, New York).Google Scholar
  • Grigelionis BI, Shiryaev AN (1966) On Stefan’s problem and optimal stopping rules for Markov processes. Theory Probab. Appl. 11(4):541–558.CrossrefGoogle Scholar
  • Han J, Jentzen A, W E (2018) Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 115(34):8505–8510.CrossrefGoogle Scholar
  • He XD, Hu S, Obłój J, Zhou XY (2017) Path-dependent and randomized strategies in Barberis’ casino gambling model. Oper. Res. 65(1):97–103.LinkGoogle Scholar
  • Herrera C, Krach F, Ruyssen P, Teichmann J (2024) Optimal stopping via randomized neural networks. Frontiers Math. Finance 3(1):31–77.CrossrefGoogle Scholar
  • Hu S, Obłój J, Zhou XY (2023) A casino gambling model under cumulative prospect theory: Analysis and algorithm. Management Sci. 69(4):2474–2496.LinkGoogle Scholar
  • Jia Y, Zhou XY (2022a) Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach. J. Machine Learn. Res. 23(154):1–55.Google Scholar
  • Jia Y, Zhou XY (2022b) Policy gradient and actor-critic learning in continuous time and space: Theory and algorithms. J. Machine Learn. Res. 23(275):1–50.Google Scholar
  • Jia Y, Zhou XY (2023) q-learning in continuous time. J. Machine Learn. Res. 24(161):1–61. Google Scholar
  • Jia Y, Ouyang D, Zhang Y (2025) Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning. Preprint, submitted March 13, https://arxiv.org/abs/2503.09981.Google Scholar
  • Jiang R, Saunders D, Weng C (2022) The reinforcement learning Kelly strategy. Quant. Finance 22(8):1445–1464.CrossrefGoogle Scholar
  • Liang J, Hu B, Jiang L, Bian B (2007) On the rate of convergence of the binomial tree scheme for American options. Numerische Mathematik 107(2):333–352.CrossrefGoogle Scholar
  • Liyanage YW, Zois DS, Chelmis C, Yao M (2019) Automating the classification of urban issue reports: An optimal stopping approach. IEEE Internat. Conf. Acoustics Speech Signal Processing (ICASSP), 3137–3141.Google Scholar
  • Munos R (2006) Policy gradient in continuous time. J. Machine Learn. Res. 7(27):771–791.Google Scholar
  • Peng Y, Wei P, Wei W (2024) Deep penalty methods: A class of deep learning algorithms for solving high dimensional optimal stopping problems. Preprint, submitted May 18, https://doi.org/10.2139/ssrn.4839092.Google Scholar
  • Pham H (2009) Continuous-Time Stochastic Control and Optimization with Financial Applications (Springer, Berlin).CrossrefGoogle Scholar
  • Reppen AM, Soner HM, Tissot-Daguette V (2025) Neural optimal stopping boundary. Math. Finance 35(2):441–469.CrossrefGoogle Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Tallec C, Blier L, Ollivier Y (2019) Making deep q-learning methods robust to time discretization. Proc. 36th Internat. Conf. Machine Learn., vol. 97 (PMLR), 6096–6104.Google Scholar
  • Wang S, Perdikaris P (2021) Deep learning of free boundary and Stefan problems. J. Comput. Phys. 428:109914.CrossrefGoogle Scholar
  • Wang H, Zhou XY (2020) Continuous-time mean–variance portfolio selection: A reinforcement learning framework. Math. Finance 30(4):1273–1308.CrossrefGoogle Scholar
  • Wang H, Zariphopoulou T, Zhou XY (2020) Reinforcement learning in continuous time and space: A stochastic control approach. J. Machine Learn. Res. 21(198):1–34.Google Scholar
  • Wu B, Li L (2024) Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market. J. Econom. Dynam. Control 158:104787.CrossrefGoogle Scholar
  • Yong J, Zhou XY (1999) Stochastic Controls: Hamiltonian Systems and HJB Equations (Springer, New York).CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.