Deep Learning for High-Dimensional Continuous-Time Stochastic Optimal Control Without Explicit Solution

Jean-Loup Dupret
Corresponding Author
Jean-Loup Dupret
[email protected]
https://orcid.org/0000-0002-3716-8945
Department of Mathematics, RiskLab, ETH Zurich, 8092 Zurich, Switzerland
Search for more papers by this author
,
Donatien Hainaut
Donatien Hainaut
[email protected]
Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA), Université Catholique de Louvain, 1348 Louvain-La-Neuve, Belgium
Search for more papers by this author

Jean-Loup Dupret

Corresponding Author

Jean-Loup Dupret

[email protected]

https://orcid.org/0000-0002-3716-8945

Department of Mathematics, RiskLab, ETH Zurich, 8092 Zurich, Switzerland

Search for more papers by this author

Donatien Hainaut

[email protected]

Institute of Statistics, Biostatistics and Actuarial Sciences (ISBA), Université Catholique de Louvain, 1348 Louvain-La-Neuve, Belgium

Search for more papers by this author

Published Online:5 Feb 2026https://doi.org/10.1287/opre.2024.1102

References

Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791.Crossref, Google Scholar
Allen-Zhu Z, Li Y, Song Z (2019) A convergence theory for deep learning via over-parameterization. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. (PMLR, New York), 242–252.Google Scholar
Almgren R (2003) Optimal execution with nonlinear impact functions and trading-enhanced risk. Appl. Math. Finance 10(1):1–18.Crossref, Google Scholar
Almgren R, Chriss N (2001) Optimal execution of portfolio transactions. J. Risk 3(2):5–40.Crossref, Google Scholar
Almgren R, Thum C, Hauptmann E, Li H (2005) Direct estimation of equity market impact. Risk 18(7):58–62.Google Scholar
Bachouch A, Huré C, Langrené N, Pham H (2022) Deep neural networks algorithms for stochastic control problems on finite horizon: Numerical applications. Methodology Comput. Appl. Probab. 24(1):143–178.Crossref, Google Scholar
Bai Y, Chaolu T, Bilige S (2022) The application of improved physics-informed neural network (IPINN) method in finance. Nonlinear Dynam. 107(4):3655–3667.Crossref, Google Scholar
Beck C, Becker S, Cheridito P, Jentzen A, Neufeld A (2021a) Deep splitting method for parabolic PDEs. SIAM J. Sci. Comput. 43(5):A3135–A3154.Crossref, Google Scholar
Beck C, Becker S, Grohs P, Jaafari N, Jentzen A (2021b) Solving the Kolmogorov PDE by means of deep learning. J. Sci. Comput. 88(3):73.Google Scholar
Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. Precup D, Teh YW, eds. Internat. Conf. Machine Learn. (PMLR), 449–458.Google Scholar
Berg J, Nyström K (2018) A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317(1):28–41.Crossref, Google Scholar
Berner J, Dablander M, Grohs P (2020) Numerically solving parametric families of high-dimensional Kolmogorov partial differential equations via deep learning. Adv. Neural Inform. Processing Systems 33(1):16615–16627.Google Scholar
Bertsekas D (2023) A Course in Reinforcement Learning (Athena Scientific, Nashua, NH).Google Scholar
Blechschmidt J, Ernst OG (2021) Three ways to solve partial differential equations with neural networks—A review. GAMM-Mitteilungen 44(2):e202100006.Crossref, Google Scholar
Cartea Á, Jaimungal S (2016) Incorporating order-flow into optimal execution. Math. Financial Econom. 10(3):339–364.Crossref, Google Scholar
Cartea Á, Gan L, Jaimungal S (2019) Trading co-integrated assets with price impact. Math. Finance 29(2):542–567.Crossref, Google Scholar
Cartea Á, Jaimungal S, Penalva J (2015) Algorithmic and High-Frequency Trading (Cambridge University Press, Cambridge, UK).Google Scholar
Cartea Á, Jaimungal S, Ricci J (2014) Buy low, sell high: A high frequency trading perspective. SIAM J. Financial Math. 5(1):415–444.Crossref, Google Scholar
Cheridito P, Dupret J-L, Hainaut D (2025) Deep learning for continuous-time stochastic control with jumps. Preprint, submitted May 21, https://arxiv.org/abs/2505.15602.Google Scholar
Domingo-Enrich C, Drozdzal M, Karrer B, Chen RT (2024a) Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control. Preprint, submitted September 13, https://arxiv.org/abs/2409.08861.Google Scholar
Domingo-Enrich C, Han J, Amos B, Bruna J, Chen RT (2024b) Stochastic optimal control matching. Adv. Neural Inform. Processing Systems 37(1):112459–112504.Google Scholar
Du S, Lee J, Li H, Wang L, Zhai X (2019) Gradient descent finds global minima of deep neural networks. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. (PMLR, New York), 1675–1685.Google Scholar
Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. Balcan MF, Weinberger KQ, eds. Internat. Conf. Machine Learn. (PMLR, New York), 1329–1338.Google Scholar
Duan J, Li J, Ge Q, Li SE, Bujarbaruah M, Ma F, Zhang D (2023) Relaxed actor-critic with convergence guarantees for continuous-time optimal control of nonlinear systems. IEEE Trans. Intelligent Vehicles 8(5):3299–3311.Crossref, Google Scholar
Fecamp S, Mikael J, Warin X (2021) Deep learning for discrete-time hedging in incomplete markets. J. Comput. Finance 25(2):51–85.Google Scholar
Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Dy J, Krause A, eds. Internat. Conf. Machine Learn. (PMLR, New York), 1587–1596.Google Scholar
Gatheral J (2010) No-dynamic-arbitrage and market impact. Quant. Finance 10(7):749–759.Crossref, Google Scholar
Greif C (2017) Numerical methods for Hamilton–Jacobi–Bellman equations. MS thesis, University of Wisconsin–Milwaukee, Milwaukee.Google Scholar
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Dy J, Krause A, eds. Internat. Conf. Machine Learn. (PMLR), 1861–1870.Google Scholar
Hainaut D (2024) Valuation of guaranteed minimum accumulation benefits (GMAB) with physics inspired neural networks. Ann. Actuar. Sci. 18(2):442–473.Google Scholar
Hainaut D, Dupret J-L (2026) Optimal control by policy iterations and constrained Gaussian process regressions. Eur. J. Oper. Res. 330(2):525–539.Crossref, Google Scholar
Han J, Hu R (2021) Recurrent neural networks for stochastic control problems with delay. Math. Control Signals Systems 33(4):775–795.Crossref, Google Scholar
Han J, Weinan E (2016) Deep learning approximation for stochastic control problems. Preprint, submitted November 2, https://arxiv.org/abs/1611.07422.Google Scholar
Han J, Jentzen A, Weinan E (2017) Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Comm. Math. Statist. 5(4):349–380.Crossref, Google Scholar
Han J, Jentzen A, Weinan E (2018) Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 115(34):8505–8510.Crossref, Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366.Crossref, Google Scholar
Howard RA (1960) Dynamic Programming and Markov Processes (John Wiley, New York).Google Scholar
Hu R, Lauriere M (2023) Recent developments in machine learning methods for stochastic control and games. Preprint, submitted March 18, http://dx.doi.org/10.2139/ssrn.4096569.Google Scholar
Huré C, Pham H, Bachouch A, Langrené N (2021) Deep neural networks algorithms for stochastic control problems on finite horizon: Convergence analysis. SIAM J. Numer. Anal. 59(1):525–557.Crossref, Google Scholar
Hutzenthaler M, Jentzen A, Kruse T, Nguyen TA (2020) A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differential Equations Appl. 1(2):10.Crossref, Google Scholar
Jacka SD, Mijatović A (2017) On the policy improvement algorithm in continuous time. Stochastics 89(1):348–359.Crossref, Google Scholar
Jentzen A, Kuckuck B, Neufeld A, von Wurstemberger P (2021) Strong error analysis for stochastic gradient descent optimization algorithms. IMA J. Numer. Anal. 41(1):455–492.Crossref, Google Scholar
Ji S, Peng S, Peng Y, Zhang X (2022) Solving stochastic optimal control problem via stochastic maximum principle with deep learning method. J. Sci. Comput. 93(1):30.Crossref, Google Scholar
Li Y (2017) Deep reinforcement learning: An overview. Preprint, submitted January 25, https://doi.org/10.48550/arXiv.1701.07274.Google Scholar
Li Y, Forsyth PA (2019) A data-driven neural network approach to optimal asset allocation for target based defined contribution pension plans. Insurance Math. Econom. 86(1):189–204.Crossref, Google Scholar
Li X, Verma D, Ruthotto L (2024) A neural network approach for stochastic optimal control. SIAM J. Sci. Comput. 46(5):C535–C556.Crossref, Google Scholar
Li Y, Guo J, Lai KK, Shi J (2022) Optimal portfolio liquidation with cross-price impacts on trading. Oper. Res. 22(2):1083–1102.Crossref, Google Scholar
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2019) Continuous control with deep reinforcement learning. Preprint, submitted September 9, https://arxiv.org/abs/1509.02971.Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Alex G, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. Preprint, submitted December 19, https://arxiv.org/abs/1312.5602.Google Scholar
Mnih V, Puigdomènech Badia A, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. Preprint, submitted February 4, https://doi.org/10.48550/arXiv.1602.01783.Google Scholar
Mowlavi S, Nabi S (2023) Optimal control of PDEs using physics-informed neural networks. J. Comput. Phys. 473(1):111731.Crossref, Google Scholar
Muhle-Karbe J, Wang Z, Webster K (2024) Stochastic liquidity as a proxy for nonlinear price impact. Oper. Res. 72(2):444–458.Link, Google Scholar
Nakamura-Zimmerer T, Gong Q, Kang W (2021) Adaptive deep learning for high-dimensional Hamilton–Jacobi–Bellman equations. SIAM J. Sci. Comput. 43(2):A1221–A1247.Crossref, Google Scholar
Nüsken N, Richter L (2021) Solving high-dimensional Hamilton–Jacobi–Bellman PDEs using neural networks: Perspectives from the theory of controlled diffusions and measures on path space. Partial Differential Equations Appl. 2(4):48.Crossref, Google Scholar
Oksendal B (2013) Stochastic Differential Equations: An Introduction with Applications (Springer Science & Business Media, Berlin).Google Scholar
Peyrl H, Herzog F, Geering HP (2005) Numerical solution of the Hamilton–Jacobi–Bellman equation for stochastic optimal control problems. Georgi JN, Lazakidou AA, Otestenau M, Niola V, eds. Proc. 2005 WSEAS Internat. Conf. Dynamical Systems Control (Stevens Point, Wisconsin), 489–497.Google Scholar
Raissi M, Perdikaris P, Karniadakis GE (2017) Physics informed deep learning (part I): Data-driven solutions of nonlinear partial differential equations. Preprint, submitted November 28, https://arxiv.org/abs/1711.10561.Google Scholar
Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Computational Phys. 378(1):686–707.Crossref, Google Scholar
Schmidhuber J, Hochreiter S (1997) Long short-term memory. Neural Comput. 9(8):1735–1780.Crossref, Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint, submitted July 20, https://arxiv.org/abs/1707.06347.Google Scholar
Sirignano J, Spiliopoulos K (2018) DGM: A deep learning algorithm for solving partial differential equations. J. Computational Phys. 375(1):1339–1364.Crossref, Google Scholar
Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. Preprint, submitted May 3, https://arxiv.org/abs/1505.00387.Google Scholar
Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
Touzi N (2013) Optimal Stochastic Control, Stochastic Target Problems, and Backward SDE, Fields Institute Monographs, vol. 29 (Springer Science & Business Media, New York).Crossref, Google Scholar
van Staden PM, Forsyth PA, Li Y (2023) A parsimonious neural network approach to solve portfolio optimization problems without using dynamic programming. Preprint, submitted March 15, https://arxiv.org/abs/2303.08968.Google Scholar
Wang S, Yu X, Perdikaris P (2022) When and why PINNs fail to train: A neural tangent kernel perspective. J. Computational Phys. 449(1):110768.Crossref, Google Scholar
Warin X (2023) Reservoir optimization and machine learning methods. EURO J. Comput. Optim. 11(1):100068.Crossref, Google Scholar
Werbos P (1992) Approximate dynamic programming for real-time control and neural modeling. White DA, Safge DA, eds. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches (Van Nostrand Reinhold, New York), 493–526.Google Scholar
Wu C, Zhu M, Tan Q, Kartha Y, Lu L (2023) A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Comput. Methods Appl. Mech. Engrg. 403(A):115671.Crossref, Google Scholar
Zhou M, Han J, Lu J (2021) Actor-critic method for high dimensional static Hamilton–Jacobi–Bellman partial differential equations based on neural networks. SIAM J. Sci. Comput. 43(6):A4043–A4066.Crossref, Google Scholar

Volume 74, Issue 3

May-June 2026

Pages v-x, 1153-1728, iii-iv

Article Information

Supplemental Material

Metrics

Information

Received:June 10, 2024
Accepted:January 08, 2026
Published Online:February 05, 2026

Cite as

Jean-Loup Dupret, Donatien Hainaut (2026) Deep Learning for High-Dimensional Continuous-Time Stochastic Optimal Control Without Explicit Solution. Operations Research 74(3):1241-1262.

https://doi.org/10.1287/opre.2024.1102

Keywords

Acknowledgments

The authors thank the two anonymous referees for helpful comments related to this work. The data sets and code generated and/or analyzed during the current study are available from the corresponding author on reasonable request.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Deep Learning for High-Dimensional Continuous-Time Stochastic Optimal Control Without Explicit Solution

References

Volume 74, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News