Deep Learning for High-Dimensional Continuous-Time Stochastic Optimal Control Without Explicit Solution

Published Online:https://doi.org/10.1287/opre.2024.1102

References

  • Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791.CrossrefGoogle Scholar
  • Allen-Zhu Z, Li Y, Song Z (2019) A convergence theory for deep learning via over-parameterization. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. (PMLR, New York), 242–252.Google Scholar
  • Almgren R (2003) Optimal execution with nonlinear impact functions and trading-enhanced risk. Appl. Math. Finance 10(1):1–18.CrossrefGoogle Scholar
  • Almgren R, Chriss N (2001) Optimal execution of portfolio transactions. J. Risk 3(2):5–40.CrossrefGoogle Scholar
  • Almgren R, Thum C, Hauptmann E, Li H (2005) Direct estimation of equity market impact. Risk 18(7):58–62.Google Scholar
  • Bachouch A, Huré C, Langrené N, Pham H (2022) Deep neural networks algorithms for stochastic control problems on finite horizon: Numerical applications. Methodology Comput. Appl. Probab. 24(1):143–178.CrossrefGoogle Scholar
  • Bai Y, Chaolu T, Bilige S (2022) The application of improved physics-informed neural network (IPINN) method in finance. Nonlinear Dynam. 107(4):3655–3667.CrossrefGoogle Scholar
  • Beck C, Becker S, Cheridito P, Jentzen A, Neufeld A (2021a) Deep splitting method for parabolic PDEs. SIAM J. Sci. Comput. 43(5):A3135–A3154.CrossrefGoogle Scholar
  • Beck C, Becker S, Grohs P, Jaafari N, Jentzen A (2021b) Solving the Kolmogorov PDE by means of deep learning. J. Sci. Comput. 88(3):73.Google Scholar
  • Bellemare MG, Dabney W, Munos R (2017) A distributional perspective on reinforcement learning. Precup D, Teh YW, eds. Internat. Conf. Machine Learn. (PMLR), 449–458.Google Scholar
  • Berg J, Nyström K (2018) A unified deep artificial neural network approach to partial differential equations in complex geometries. Neurocomputing 317(1):28–41.CrossrefGoogle Scholar
  • Berner J, Dablander M, Grohs P (2020) Numerically solving parametric families of high-dimensional Kolmogorov partial differential equations via deep learning. Adv. Neural Inform. Processing Systems 33(1):16615–16627.Google Scholar
  • Bertsekas D (2023) A Course in Reinforcement Learning (Athena Scientific, Nashua, NH).Google Scholar
  • Blechschmidt J, Ernst OG (2021) Three ways to solve partial differential equations with neural networks—A review. GAMM-Mitteilungen 44(2):e202100006.CrossrefGoogle Scholar
  • Cartea Á, Jaimungal S (2016) Incorporating order-flow into optimal execution. Math. Financial Econom. 10(3):339–364.CrossrefGoogle Scholar
  • Cartea Á, Gan L, Jaimungal S (2019) Trading co-integrated assets with price impact. Math. Finance 29(2):542–567.CrossrefGoogle Scholar
  • Cartea Á, Jaimungal S, Penalva J (2015) Algorithmic and High-Frequency Trading (Cambridge University Press, Cambridge, UK).Google Scholar
  • Cartea Á, Jaimungal S, Ricci J (2014) Buy low, sell high: A high frequency trading perspective. SIAM J. Financial Math. 5(1):415–444.CrossrefGoogle Scholar
  • Cheridito P, Dupret J-L, Hainaut D (2025) Deep learning for continuous-time stochastic control with jumps. Preprint, submitted May 21, https://arxiv.org/abs/2505.15602.Google Scholar
  • Domingo-Enrich C, Drozdzal M, Karrer B, Chen RT (2024a) Adjoint matching: Fine-tuning flow and diffusion generative models with memoryless stochastic optimal control. Preprint, submitted September 13, https://arxiv.org/abs/2409.08861.Google Scholar
  • Domingo-Enrich C, Han J, Amos B, Bruna J, Chen RT (2024b) Stochastic optimal control matching. Adv. Neural Inform. Processing Systems 37(1):112459–112504.Google Scholar
  • Du S, Lee J, Li H, Wang L, Zhai X (2019) Gradient descent finds global minima of deep neural networks. Chaudhuri K, Salakhutdinov R, eds. Internat. Conf. Machine Learn. (PMLR, New York), 1675–1685.Google Scholar
  • Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P (2016) Benchmarking deep reinforcement learning for continuous control. Balcan MF, Weinberger KQ, eds. Internat. Conf. Machine Learn. (PMLR, New York), 1329–1338.Google Scholar
  • Duan J, Li J, Ge Q, Li SE, Bujarbaruah M, Ma F, Zhang D (2023) Relaxed actor-critic with convergence guarantees for continuous-time optimal control of nonlinear systems. IEEE Trans. Intelligent Vehicles 8(5):3299–3311.CrossrefGoogle Scholar
  • Fecamp S, Mikael J, Warin X (2021) Deep learning for discrete-time hedging in incomplete markets. J. Comput. Finance 25(2):51–85.Google Scholar
  • Fujimoto S, Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Dy J, Krause A, eds. Internat. Conf. Machine Learn. (PMLR, New York), 1587–1596.Google Scholar
  • Gatheral J (2010) No-dynamic-arbitrage and market impact. Quant. Finance 10(7):749–759.CrossrefGoogle Scholar
  • Greif C (2017) Numerical methods for Hamilton–Jacobi–Bellman equations. MS thesis, University of Wisconsin–Milwaukee, Milwaukee.Google Scholar
  • Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. Dy J, Krause A, eds. Internat. Conf. Machine Learn. (PMLR), 1861–1870.Google Scholar
  • Hainaut D (2024) Valuation of guaranteed minimum accumulation benefits (GMAB) with physics inspired neural networks. Ann. Actuar. Sci. 18(2):442–473.Google Scholar
  • Hainaut D, Dupret J-L (2026) Optimal control by policy iterations and constrained Gaussian process regressions. Eur. J. Oper. Res. 330(2):525–539.CrossrefGoogle Scholar
  • Han J, Hu R (2021) Recurrent neural networks for stochastic control problems with delay. Math. Control Signals Systems 33(4):775–795.CrossrefGoogle Scholar
  • Han J, Weinan E (2016) Deep learning approximation for stochastic control problems. Preprint, submitted November 2, https://arxiv.org/abs/1611.07422.Google Scholar
  • Han J, Jentzen A, Weinan E (2017) Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations. Comm. Math. Statist. 5(4):349–380.CrossrefGoogle Scholar
  • Han J, Jentzen A, Weinan E (2018) Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. USA 115(34):8505–8510.CrossrefGoogle Scholar
  • Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Networks 2(5):359–366.CrossrefGoogle Scholar
  • Howard RA (1960) Dynamic Programming and Markov Processes (John Wiley, New York).Google Scholar
  • Hu R, Lauriere M (2023) Recent developments in machine learning methods for stochastic control and games. Preprint, submitted March 18, http://dx.doi.org/10.2139/ssrn.4096569.Google Scholar
  • Huré C, Pham H, Bachouch A, Langrené N (2021) Deep neural networks algorithms for stochastic control problems on finite horizon: Convergence analysis. SIAM J. Numer. Anal. 59(1):525–557.CrossrefGoogle Scholar
  • Hutzenthaler M, Jentzen A, Kruse T, Nguyen TA (2020) A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differential Equations Appl. 1(2):10.CrossrefGoogle Scholar
  • Jacka SD, Mijatović A (2017) On the policy improvement algorithm in continuous time. Stochastics 89(1):348–359.CrossrefGoogle Scholar
  • Jentzen A, Kuckuck B, Neufeld A, von Wurstemberger P (2021) Strong error analysis for stochastic gradient descent optimization algorithms. IMA J. Numer. Anal. 41(1):455–492.CrossrefGoogle Scholar
  • Ji S, Peng S, Peng Y, Zhang X (2022) Solving stochastic optimal control problem via stochastic maximum principle with deep learning method. J. Sci. Comput. 93(1):30.CrossrefGoogle Scholar
  • Li Y (2017) Deep reinforcement learning: An overview. Preprint, submitted January 25, https://doi.org/10.48550/arXiv.1701.07274.Google Scholar
  • Li Y, Forsyth PA (2019) A data-driven neural network approach to optimal asset allocation for target based defined contribution pension plans. Insurance Math. Econom. 86(1):189–204.CrossrefGoogle Scholar
  • Li X, Verma D, Ruthotto L (2024) A neural network approach for stochastic optimal control. SIAM J. Sci. Comput. 46(5):C535–C556.CrossrefGoogle Scholar
  • Li Y, Guo J, Lai KK, Shi J (2022) Optimal portfolio liquidation with cross-price impacts on trading. Oper. Res. 22(2):1083–1102.CrossrefGoogle Scholar
  • Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2019) Continuous control with deep reinforcement learning. Preprint, submitted September 9, https://arxiv.org/abs/1509.02971.Google Scholar
  • Mnih V, Kavukcuoglu K, Silver D, Alex G, Antonoglou I, Wierstra D, Riedmiller M (2013) Playing Atari with deep reinforcement learning. Preprint, submitted December 19, https://arxiv.org/abs/1312.5602.Google Scholar
  • Mnih V, Puigdomènech Badia A, Mirza M, Graves A, Lillicrap TP, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. Preprint, submitted February 4, https://doi.org/10.48550/arXiv.1602.01783.Google Scholar
  • Mowlavi S, Nabi S (2023) Optimal control of PDEs using physics-informed neural networks. J. Comput. Phys. 473(1):111731.CrossrefGoogle Scholar
  • Muhle-Karbe J, Wang Z, Webster K (2024) Stochastic liquidity as a proxy for nonlinear price impact. Oper. Res. 72(2):444–458.LinkGoogle Scholar
  • Nakamura-Zimmerer T, Gong Q, Kang W (2021) Adaptive deep learning for high-dimensional Hamilton–Jacobi–Bellman equations. SIAM J. Sci. Comput. 43(2):A1221–A1247.CrossrefGoogle Scholar
  • Nüsken N, Richter L (2021) Solving high-dimensional Hamilton–Jacobi–Bellman PDEs using neural networks: Perspectives from the theory of controlled diffusions and measures on path space. Partial Differential Equations Appl. 2(4):48.CrossrefGoogle Scholar
  • Oksendal B (2013) Stochastic Differential Equations: An Introduction with Applications (Springer Science & Business Media, Berlin).Google Scholar
  • Peyrl H, Herzog F, Geering HP (2005) Numerical solution of the Hamilton–Jacobi–Bellman equation for stochastic optimal control problems. Georgi JN, Lazakidou AA, Otestenau M, Niola V, eds. Proc. 2005 WSEAS Internat. Conf. Dynamical Systems Control (Stevens Point, Wisconsin), 489–497.Google Scholar
  • Raissi M, Perdikaris P, Karniadakis GE (2017) Physics informed deep learning (part I): Data-driven solutions of nonlinear partial differential equations. Preprint, submitted November 28, https://arxiv.org/abs/1711.10561.Google Scholar
  • Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J. Computational Phys. 378(1):686–707.CrossrefGoogle Scholar
  • Schmidhuber J, Hochreiter S (1997) Long short-term memory. Neural Comput. 9(8):1735–1780.CrossrefGoogle Scholar
  • Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. Preprint, submitted July 20, https://arxiv.org/abs/1707.06347.Google Scholar
  • Sirignano J, Spiliopoulos K (2018) DGM: A deep learning algorithm for solving partial differential equations. J. Computational Phys. 375(1):1339–1364.CrossrefGoogle Scholar
  • Srivastava RK, Greff K, Schmidhuber J (2015) Highway networks. Preprint, submitted May 3, https://arxiv.org/abs/1505.00387.Google Scholar
  • Sutton RS, Barto AG (2018) Reinforcement Learning: An Introduction (MIT Press, Cambridge, MA).Google Scholar
  • Touzi N (2013) Optimal Stochastic Control, Stochastic Target Problems, and Backward SDE, Fields Institute Monographs, vol. 29 (Springer Science & Business Media, New York).CrossrefGoogle Scholar
  • van Staden PM, Forsyth PA, Li Y (2023) A parsimonious neural network approach to solve portfolio optimization problems without using dynamic programming. Preprint, submitted March 15, https://arxiv.org/abs/2303.08968.Google Scholar
  • Wang S, Yu X, Perdikaris P (2022) When and why PINNs fail to train: A neural tangent kernel perspective. J. Computational Phys. 449(1):110768.CrossrefGoogle Scholar
  • Warin X (2023) Reservoir optimization and machine learning methods. EURO J. Comput. Optim. 11(1):100068.CrossrefGoogle Scholar
  • Werbos P (1992) Approximate dynamic programming for real-time control and neural modeling. White DA, Safge DA, eds. Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches (Van Nostrand Reinhold, New York), 493–526.Google Scholar
  • Wu C, Zhu M, Tan Q, Kartha Y, Lu L (2023) A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Comput. Methods Appl. Mech. Engrg. 403(A):115671.CrossrefGoogle Scholar
  • Zhou M, Han J, Lu J (2021) Actor-critic method for high dimensional static Hamilton–Jacobi–Bellman partial differential equations based on neural networks. SIAM J. Sci. Comput. 43(6):A4043–A4066.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.