Open Access

Stochastic Inertial Dynamics via Time Scaling and Averaging

Rodrigo Maulen-Soto
Corresponding Author
Rodrigo Maulen-Soto
[email protected]
https://orcid.org/0009-0007-1383-3297
GREYC: Groupe de recherche en informatique, image et instrumentation de Caen, Ensicaen, Université de Caen, Normandie Université, 14000 Caen, France
Search for more papers by this author
,
Jalal Fadili
Jalal Fadili
[email protected]
https://orcid.org/0000-0002-8165-7578
GREYC: Groupe de recherche en informatique, image et instrumentation de Caen, Ensicaen, Université de Caen, Normandie Université, 14000 Caen, France
Search for more papers by this author
,
Hédy Attouch
Hédy Attouch
[email protected]
https://orcid.org/0000-0003-2676-0887
IMAG: Institut Montpelliérain Alexander Grothendieck, Université de Montpellier, 34090 Montpellier, France
Search for more papers by this author
,
Peter Ochs
Peter Ochs
[email protected]
https://orcid.org/0000-0002-4880-7511
Department of Mathematics and Computer Science, Saarland University, 66123 Saarbrucken, Germany
Search for more papers by this author

Corresponding Author

Rodrigo Maulen-Soto

GREYC: Groupe de recherche en informatique, image et instrumentation de Caen, Ensicaen, Université de Caen, Normandie Université, 14000 Caen, France

Search for more papers by this author

Jalal Fadili

[email protected]

https://orcid.org/0000-0002-8165-7578

GREYC: Groupe de recherche en informatique, image et instrumentation de Caen, Ensicaen, Université de Caen, Normandie Université, 14000 Caen, France

Search for more papers by this author

Hédy Attouch

[email protected]

https://orcid.org/0000-0003-2676-0887

IMAG: Institut Montpelliérain Alexander Grothendieck, Université de Montpellier, 34090 Montpellier, France

Search for more papers by this author

Peter Ochs

[email protected]

https://orcid.org/0000-0002-4880-7511

Department of Mathematics and Computer Science, Saarland University, 66123 Saarbrucken, Germany

Search for more papers by this author

Published Online:19 Jan 2026https://doi.org/10.1287/stsy.2024.0068

References

Alecsa C, László S, Pinta T (2021) An extension of the second order dynamical system that models Nesterov’s convex gradient method. Appl. Math. Optim. 84(2):1687–1716.Google Scholar
Allen-Zhu Z (2017) Katyusha: The first direct acceleration of stochastic gradient methods. J. Machine Learn. Res. 18(221):1–51.Google Scholar
Apidopoulos V, Aujol JF, Dossal C (2018) The differential inclusion modeling the FISTA algorithm and optimality of convergence rate in the case b≤3. SIAM J. Optim. 28(1):551–574.Google Scholar
Assran M, Rabbat M (2020) On the convergence of Nesterov’s accelerated gradient method in stochastic settings. Shawe-Taylor J, Zemel RS, Bartlett P, Pereira FCN, Weinberger KQ, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (Curran Associates, Inc., Red Hook, NY), 410–420.Google Scholar
Attouch H, Cabot A (2017) Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differential Equations 263(9):5412–5458.Google Scholar
Attouch H, Peypouquet J (2016) The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1k2. SIAM J. Optim. 26(3):1824–1834.Google Scholar
Attouch H, Bot R, Nguyen DK (2024) Fast convex optimization via time scale and averaging of the steepest descent. Math. Oper. Res. 50(4):2633–2665.Google Scholar
Attouch H, Chbani Z, Riahi H (2019) Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α≤3. ESAIM: Control Optim. Calculus Variations (ESAIM-COCV), 25(2).Google Scholar
Attouch H, Fadili J, Kungurtsev V (2023a) On the effect of perturbations in first-order optimization methods with inertia and Hessian driven damping. Evolution Equations Control 12(1):71–117.Google Scholar
Attouch H, Fadili J, Kungurtsev V (2024) The stochastic Ravine accelerated gradient method with general extrapolation coefficients. Preprint, Submitted March 7, https://arxiv.org/abs/2403.04860.Google Scholar
Attouch H, Goudou X, Redont P (2011) The heavy ball with friction method. I: The continuous dynamical system. Comm. Contemporary Math. 2(1):1–34.Google Scholar
Attouch H, Peypouquet J, Redont P (2016) Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differential Equations 261(10):5734–5783.Google Scholar
Attouch H, Balhag A, Chbani Z, Riahi H (2022a) Fast convex optimization via inertial dynamics combining viscous and Hessian-driven damping with time rescaling. Evolution Equations Control Theory 11(2):487–514.Google Scholar
Attouch H, Cabot A, Chbani Z, Riahi H (2018a) Accelerated forward-backward algorithms with perturbations: Application to Tikhonov regularization. J. Optim. Theory Appl. 179(1):1–36.Google Scholar
Attouch H, Chbani Z, Fadili J, Riahi H (2023b) Convergence of iterates for first-order optimization algorithms with inertia and Hessian driven damping. Optimization 72(5):1199–1238.Google Scholar
Attouch H, Chbani Z, Fadili J, Riahi H (2022b) First-order optimization algorithms via inertial systems with Hessian driven damping. Math. Programming 193(1):113–155.Google Scholar
Attouch H, Chbani Z, Peypouquet J, Redont P (2018b) Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Programming Ser. B 168(1–2):123–175.Google Scholar
Barakat A, Bianchi P, Hachem W, Schechtman S (2021) Stochastic optimization with momentum: Convergence, fluctuations, and traps avoidance. J. Statist. 15(2):3892–3947.Google Scholar
Bolte J, Nguyen T, Peypouquet J, Suter BW (2016) From error bounds to the complexity of first-order descent methods for convex functions. Math. Programming 165(2):471–507.Google Scholar
Brézis H (1973) Opérateurs Maximaux Monotones et Semi-Groupes de Contractions dans les Espaces de Hilbert, Mathematics Studies, vol. 5 (North-Holland, New York).Google Scholar
Cabot A (2009) Asymptotics for a gradient system with memory term. Proc. Amer. Math. Soc. 137(9):3013–3024.Google Scholar
Cabot A, Engler H, Gadat S (2009) On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Amer. Math. Soc. 361(11):5983–6017. Google Scholar
Castera C, Attouch H, Fadili J, Ochs P (2024) Continuous Newton-like methods featuring inertia and variable mass. SIAM J. Optim. 34(1):251–277.Google Scholar
Castera C, Bolte J, Févotte C, Pauwels E (2021) An inertial Newton algorithm for deep learning. J. Machine Learn. Res. 22(134):1–31.Google Scholar
Cauchy A (1847) Méthode générale pour la résolution des systèmes d’équations simultanées. Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences 25:536–538.Google Scholar
Cheng X, Chatterji N, Bartlett PL, Jordan MI (2018) Underdamped Langevin MCMC: A non-asymptotic analysis. Proc. Machine Learn. Res. 75:1–24. Google Scholar
Da Prato G, Zabszyk J (2014) Stochastic Equations in Infinite Dimensions, 2nd ed. (Cambridge University Press, Cambridge, UK).Google Scholar
Dalalyan AS, Karagulyan A (2019) User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes Appl. 129(12):5278–5311. Google Scholar
Dalalyan A, Riou-Durand L, Karagulyan A (2019) Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets. J. Machine Learn. Res. 23(235):1–38.Google Scholar
Dambrine M, Dossal C, Puig B, Rondepierre A (2024) Stochastic differential equations for modeling first order optimization methods. SIAM J. Optim. 34(2):1402–1426.Google Scholar
Defazio A, Jelassi S (2022) Adaptivity without compromise: A momentumized, adaptive, dual averaged gradient method for stochastic optimization. J. Machine Learn. Res. 23(144):1–34.Google Scholar
Dossal C, Aujol J (2015) Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4):2408–2433.Google Scholar
Driggs D, Ehrhardt MJ, Schönlieb C (2022) Accelerating variance-reduced stochastic gradient methods. Math. Programming 191(2):671–715.Google Scholar
Fontaine X, De Bortoli V, Durmus A (2021) Convergence rates and approximation results for SGD and its continuous-time counterpart. Belkin M, Kpotufe S, eds. Proc. Thirty Fourth Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 134 (PMLR, New York), 1965–2058.Google Scholar
Frostig R, Ge R, Kakade S, Sidford A (2015) Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 37 (PMLR, New York), 2540–2548.Google Scholar
Gadat S, Panloup F (2014) Long time behaviour and stationary regime of memory gradient diffusions. Annales de L’Institut Henri Poincaré - Probabilités et Statistiques 50(2):564–601.Google Scholar
Gadat S, Panloup F, Saadane S (2018) Stochastic heavy ball. Electronic J. Statist. 12(1):461–529.Google Scholar
Gawarecki L, Mandrekar V (2011) Stochastic Differential Equations in Infinite Dimensions: With Applications to Stochastic Partial Differential Equations (Springer, New York).Google Scholar
Goudou X, Munier J (2005) Asymptotic behavior of solutions of a gradient-like integrodifferential Volterra inclusion. Adv. Math. Sci. Appl. 15(2):509–525.Google Scholar
Hamadouche A, Wu Y, Wallace AM, Mota JC (2024) Sharper bounds for proximal gradient algorithms with errors. SIAM J. Optim. 34(1):278–305.Google Scholar
Haraux A, Jendoubi M (2012) On a second order dissipative ODE in Hilbert space with an integrable source term. Acta Math. Sci. 32(1):155–163.Google Scholar
Hu W, Li C, Su W (2019c) On the global convergence of continuous–Time stochastic heavy–ball method for nonconvex optimization. 2019 IEEE Internat. Conf. Big Data (IEEE, Piscataway, NJ), 94–104. Google Scholar
Hu W, Li C, Zhou X (2019a) On the global convergence of continuous-time stochastic heavy-ball method for nonconvex optimization. Baru CK, Huan J, Khan L, Hu X, Ak R, Tian Y, Barga RS, Zaniolo C, Lee K, Ye Y, eds. Proc. 2019 IEEE Internat. Conf. Big Data (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 94–104.Google Scholar
Hu W, Li C, Li L, Lui JG (2019b) On the diffusion approximation of nonconvex stochastic gradient descent. Ann. Math. Sci. Appl. 4(1):3–32.Google Scholar
Jain P, Netrapalli P, Kakade SM, Kidambi R, Sidford A (2017) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learn. Res. 18(223):1–42.Google Scholar
Kloeden PE, Platen E (1992) Numerical Solution of Stochastic Differential Equations (Springer-Verlag, Berlin).Google Scholar
Laborde M, Oberman A (2020) A Lyapunov analysis for accelerated gradient methods: From deterministic to stochastic case. Chiappa S, Calandra R, eds. Proc. Twenty Third Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 108 (PMLR, New York), 602–612.Google Scholar
Lan G (2020) First-order and Stochastic Optimization Methods for Machine Learning (Springer Nature, Cham, Switzerland).Google Scholar
Latz J (2021) Analysis of stochastic gradient descent in continuous time. Statist. Comput. 31:39. Google Scholar
Le T (2024) Nonsmooth nonconvex stochastic heavy ball. J. Optim. Theory Appl. 201(2):699–719.Google Scholar
Li Z, Malladi S, Arora S (2021) On the validity of modeling SGD with stochastic differential equations. Advances in Neural Information Processing Systems, vol. 21 (Curran Associates Inc., Red Hook, NY).Google Scholar
Li Q, Tai C, Weinan E (2017) Stochastic modified equations and adaptive stochastic gradient algorithms. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 2101–2110.Google Scholar
Li X, Shen Z, Zhang L, He N (2024) A Hessian-aware stochastic differential equation for modelling SGD. Preprint, submitted May 28, https://arxiv.org/abs/2405.18373.Google Scholar
Lin H, Mairal J, Harchaoui Z (2017) Catalyst acceleration for first-order convex optimization: From theory to practice. J. Machine Learn. Res. 18(212):1–54.Google Scholar
Loizou N, Richtárik P (2020) Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods. Comput. Optim. Appl. 77(3):653–710.Google Scholar
Łojasiewicz S (1963) Une propriété topologique des sous-ensembles analytiques réels. Colloque du C.N.R.S. sur les Équations aux Dérivées Partielles (Paris), 87–89.Google Scholar
Łojasiewicz S (1965) Ensembles Semi-analytiques (Prépublication) (Institut des Hautes Études Scientifiques, Bures-sur-Yvette, France).Google Scholar
Łojasiewicz S (1984) Sur les trajectoires du gradient d’une fonction analytique. Seminari di Geometria 1982/1983 (Universita di Bologna, Dipartemento di Matematica, Bologna, Italy), 115–117.Google Scholar
Ma YA, Chatterji N, Cheng X, Flammarion N, Bartlett P, Jordan MI (2021) Is there an analog of Nesterov acceleration for MCMC? Bernoulli 27(3):1942–1992.Google Scholar
Mandt S, Hoffman M, Blei D (2016) A variational analysis of stochastic gradient algorithms. Proc. 33rd Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
Mao X (2007) Stochastic Differential Equations and Applications, 2nd ed. (Woodhead Publishing, Chichester, UK).Google Scholar
Maulen-Soto R, Fadili J, Attouch H (2024) An SDE perspective on stochastic convex optimization. Math. Oper. Res. 50(4):3190–3221.Google Scholar
Maulen-Soto R, Fadili J, Attouch H (2025) Stochastic differential inclusions and Tikhonov regularization for stochastic non-smooth convex optimization in Hilbert spaces. Open J. Math. Optim. 6, article no.9.Google Scholar
May R (2017) Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turkish J. Math. 41(3):681–685.Google Scholar
Mertikopoulos P, Staudigl M (2018) On the convergence of gradient-like flows with noisy gradient input. SIAM J. Optim. 28(1):163–197.Google Scholar
Muehlebach M, Jordan MI (2021) Optimization with momentum: Dynamical, control-theoretic, and symplectic perspectives. J. Machine Learn. Res. 22(73):1–50.Google Scholar
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate O(1/k2). Doklady Akademii Nauk SSSR (Proc. USSR Acad. Sci.) 269(3):543–547. Google Scholar
Öksendal B (2003) Stochastic Differential Equations, 6th ed. (Springer-Verlag, Berlin).Google Scholar
Orvieto A, Lucchi A (2019) Continuous-time models for stochastic optimization algorithms. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Orvieto A, Kohler J, Lucchi A (2020) The role of memory in stochastic optimization. Adams RP, Gogate V, eds. Proc. 35th Conf. Uncertainty Artificial Intelligence, Proceedings of Machine Learning Research, vol. 115 (PMLR, New York), 356–366.Google Scholar
Pavliotis GA (2014) Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck Equation and Large Deviations (Springer, New York). Google Scholar
Pettersson R (1995) Yosida approximations for multivalued stochastic differential equations. Stochastics Stochastics Rep. 52(1–2):107–120.Google Scholar
Polyak B (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Physics 4(5):1–17.Google Scholar
Robins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Google Scholar
Rockafellar R (1997) Convex Analysis (Princeton University Press, Princeton, NJ).Google Scholar
Schmidt M, Le Roux N, Bach F (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. Shawe‑Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Inc., Red Hook, NY).Google Scholar
Shi B, Su WJ, Jordan MI (2023) On learning rates and Schrödinger operators. J. Machine Learn. Res. 24(379):1–53.Google Scholar
Shi B, Du S, Jordan M, Su WJ (2022) Understanding the acceleration phenomenon via high resolution differential equations. Math. Programming 195:79–148.Google Scholar
Soatto S, Chaudhari P (2018) Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. Choi SP, Yilmaz O, Poor HV, eds. Proc. 2018 Inform. Theory Appl. Workshop (ITA) (IEEE, Piscataway, NJ), 1–10.Google Scholar
Su W, Boyd S, Candès E (2016) A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. J. Machine Learn. Res. 17(153):1–43.Google Scholar
Villa S, Salzo S, Baldassarres L (2013) Accelerated and inexact forward-backward. SIAM J. Optim. 23(3):1607–1633.Google Scholar
Xie Z, Sato I, Sugiyama M (2021) A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. Proc. Ninth Internat. Conf. Learn. Representations (ICLR) (OpenReview.net).Google Scholar
Yan B (2018) Theoretical analysis for convex and non-convex clustering algorithms Doctoral dissertation, The University of Texas at Austin, Austin.Google Scholar

Volume 16, Issue 1

March 2026

Pages 1-107

Article Information

Metrics

Information

Received:March 27, 2024
Accepted:October 24, 2025
Published Online:January 19, 2026

Cite as

Rodrigo Maulen-Soto, Jalal Fadili, Hédy Attouch, Peter Ochs (2026) Stochastic Inertial Dynamics via Time Scaling and Averaging. Stochastic Systems 16(1):61-89.

https://doi.org/10.1287/stsy.2024.0068

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Stochastic Inertial Dynamics via Time Scaling and Averaging

References

Volume 16, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News