Stochastic Inertial Dynamics via Time Scaling and Averaging
Published Online:19 Jan 2026https://doi.org/10.1287/stsy.2024.0068
References
- (2021) An extension of the second order dynamical system that models Nesterov’s convex gradient method. Appl. Math. Optim. 84(2):1687–1716.Google Scholar
- (2017) Katyusha: The first direct acceleration of stochastic gradient methods. J. Machine Learn. Res. 18(221):1–51.Google Scholar
- (2018) The differential inclusion modeling the FISTA algorithm and optimality of convergence rate in the case b≤3. SIAM J. Optim. 28(1):551–574.Google Scholar
- (2020) On the convergence of Nesterov’s accelerated gradient method in stochastic settings. Shawe-Taylor J, Zemel RS, Bartlett P, Pereira FCN, Weinberger KQ, eds. Proc. 37th Internat. Conf. Machine Learn., vol. 119 (Curran Associates, Inc., Red Hook, NY), 410–420.Google Scholar
- (2017) Asymptotic stabilization of inertial gradient dynamics with time-dependent viscosity. J. Differential Equations 263(9):5412–5458.Google Scholar
- (2016) The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than 1k2. SIAM J. Optim. 26(3):1824–1834.Google Scholar
- (2024) Fast convex optimization via time scale and averaging of the steepest descent. Math. Oper. Res. 50(4):2633–2665.Google Scholar
- (2019) Rate of convergence of the Nesterov accelerated gradient method in the subcritical case α≤3. ESAIM: Control Optim. Calculus Variations (ESAIM-COCV), 25(2).Google Scholar
- (2023a) On the effect of perturbations in first-order optimization methods with inertia and Hessian driven damping. Evolution Equations Control 12(1):71–117.Google Scholar
- (2024) The stochastic Ravine accelerated gradient method with general extrapolation coefficients. Preprint, Submitted March 7, https://arxiv.org/abs/2403.04860.Google Scholar
- (2011) The heavy ball with friction method. I: The continuous dynamical system. Comm. Contemporary Math. 2(1):1–34.Google Scholar
- (2016) Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differential Equations 261(10):5734–5783.Google Scholar
- (2022a) Fast convex optimization via inertial dynamics combining viscous and Hessian-driven damping with time rescaling. Evolution Equations Control Theory 11(2):487–514.Google Scholar
- (2018a) Accelerated forward-backward algorithms with perturbations: Application to Tikhonov regularization. J. Optim. Theory Appl. 179(1):1–36.Google Scholar
- (2023b) Convergence of iterates for first-order optimization algorithms with inertia and Hessian driven damping. Optimization 72(5):1199–1238.Google Scholar
- (2022b) First-order optimization algorithms via inertial systems with Hessian driven damping. Math. Programming 193(1):113–155.Google Scholar
- (2018b) Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Programming Ser. B 168(1–2):123–175.Google Scholar
- (2021) Stochastic optimization with momentum: Convergence, fluctuations, and traps avoidance. J. Statist. 15(2):3892–3947.Google Scholar
- (2016) From error bounds to the complexity of first-order descent methods for convex functions. Math. Programming 165(2):471–507.Google Scholar
- (1973) Opérateurs Maximaux Monotones et Semi-Groupes de Contractions dans les Espaces de Hilbert, Mathematics Studies, vol. 5 (North-Holland, New York).Google Scholar
- (2009) Asymptotics for a gradient system with memory term. Proc. Amer. Math. Soc. 137(9):3013–3024.Google Scholar
- , Gadat S (2009) On the long time behavior of second order differential equations with asymptotically small dissipation. Trans. Amer. Math. Soc. 361(11):5983–6017. Google Scholar
- (2024) Continuous Newton-like methods featuring inertia and variable mass. SIAM J. Optim. 34(1):251–277.Google Scholar
- (2021) An inertial Newton algorithm for deep learning. J. Machine Learn. Res. 22(134):1–31.Google Scholar
- (1847) Méthode générale pour la résolution des systèmes d’équations simultanées. Comptes Rendus Hebdomadaires des Séances de l’Académie des Sciences 25:536–538.Google Scholar
- (2018) Underdamped Langevin MCMC: A non-asymptotic analysis. Proc. Machine Learn. Res. 75:1–24. Google Scholar
- (2014) Stochastic Equations in Infinite Dimensions, 2nd ed. (Cambridge University Press, Cambridge, UK).Google Scholar
- (2019) User-friendly guarantees for the Langevin Monte Carlo with inaccurate gradient. Stochastic Processes Appl. 129(12):5278–5311. Google Scholar
- (2019) Bounding the error of discretized Langevin algorithms for non-strongly log-concave targets. J. Machine Learn. Res. 23(235):1–38.Google Scholar
- (2024) Stochastic differential equations for modeling first order optimization methods. SIAM J. Optim. 34(2):1402–1426.Google Scholar
- (2022) Adaptivity without compromise: A momentumized, adaptive, dual averaged gradient method for stochastic optimization. J. Machine Learn. Res. 23(144):1–34.Google Scholar
- (2015) Stability of over-relaxations for the forward-backward algorithm, application to FISTA. SIAM J. Optim. 25(4):2408–2433.Google Scholar
- (2022) Accelerating variance-reduced stochastic gradient methods. Math. Programming 191(2):671–715.Google Scholar
- (2021) Convergence rates and approximation results for SGD and its continuous-time counterpart. Belkin M, Kpotufe S, eds. Proc. Thirty Fourth Conf. Learn. Theory, Proceedings of Machine Learning Research, vol. 134 (PMLR, New York), 1965–2058.Google Scholar
- (2015) Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization. Bach F, Blei D, eds. Proc. 32nd Internat. Conf. Machine Learn., Proceedings of Machine Learning Research, vol. 37 (PMLR, New York), 2540–2548.Google Scholar
- (2014) Long time behaviour and stationary regime of memory gradient diffusions. Annales de L’Institut Henri Poincaré - Probabilités et Statistiques 50(2):564–601.Google Scholar
- (2018) Stochastic heavy ball. Electronic J. Statist. 12(1):461–529.Google Scholar
- (2011) Stochastic Differential Equations in Infinite Dimensions: With Applications to Stochastic Partial Differential Equations (Springer, New York).Google Scholar
- (2005) Asymptotic behavior of solutions of a gradient-like integrodifferential Volterra inclusion. Adv. Math. Sci. Appl. 15(2):509–525.Google Scholar
- (2024) Sharper bounds for proximal gradient algorithms with errors. SIAM J. Optim. 34(1):278–305.Google Scholar
- (2012) On a second order dissipative ODE in Hilbert space with an integrable source term. Acta Math. Sci. 32(1):155–163.Google Scholar
- (2019c) On the global convergence of continuous–Time stochastic heavy–ball method for nonconvex optimization. 2019 IEEE Internat. Conf. Big Data (IEEE, Piscataway, NJ), 94–104. Google Scholar
- (2019a) On the global convergence of continuous-time stochastic heavy-ball method for nonconvex optimization. Baru CK, Huan J, Khan L, Hu X, Ak R, Tian Y, Barga RS, Zaniolo C, Lee K, Ye Y, eds. Proc. 2019 IEEE Internat. Conf. Big Data (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 94–104.Google Scholar
- (2019b) On the diffusion approximation of nonconvex stochastic gradient descent. Ann. Math. Sci. Appl. 4(1):3–32.Google Scholar
- (2017) Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification. J. Machine Learn. Res. 18(223):1–42.Google Scholar
- (1992) Numerical Solution of Stochastic Differential Equations (Springer-Verlag, Berlin).Google Scholar
- (2020) A Lyapunov analysis for accelerated gradient methods: From deterministic to stochastic case. Chiappa S, Calandra R, eds. Proc. Twenty Third Internat. Conf. Artificial Intelligence Statist., Proceedings of Machine Learning Research, vol. 108 (PMLR, New York), 602–612.Google Scholar
- (2020) First-order and Stochastic Optimization Methods for Machine Learning (Springer Nature, Cham, Switzerland).Google Scholar
- (2021) Analysis of stochastic gradient descent in continuous time. Statist. Comput. 31:39. Google Scholar
- (2024) Nonsmooth nonconvex stochastic heavy ball. J. Optim. Theory Appl. 201(2):699–719.Google Scholar
- (2021) On the validity of modeling SGD with stochastic differential equations. Advances in Neural Information Processing Systems, vol. 21 (Curran Associates Inc., Red Hook, NY).Google Scholar
- (2017) Stochastic modified equations and adaptive stochastic gradient algorithms. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (PMLR, New York), 2101–2110.Google Scholar
- (2024) A Hessian-aware stochastic differential equation for modelling SGD. Preprint, submitted May 28, https://arxiv.org/abs/2405.18373.Google Scholar
- (2017) Catalyst acceleration for first-order convex optimization: From theory to practice. J. Machine Learn. Res. 18(212):1–54.Google Scholar
- (2020) Momentum and stochastic momentum for stochastic gradient, Newton, proximal point and subspace descent methods. Comput. Optim. Appl. 77(3):653–710.Google Scholar
- (1963) Une propriété topologique des sous-ensembles analytiques réels. Colloque du C.N.R.S. sur les Équations aux Dérivées Partielles (Paris), 87–89.Google Scholar
- (1965) Ensembles Semi-analytiques (Prépublication) (Institut des Hautes Études Scientifiques, Bures-sur-Yvette, France).Google Scholar
- (1984) Sur les trajectoires du gradient d’une fonction analytique. Seminari di Geometria 1982/1983 (Universita di Bologna, Dipartemento di Matematica, Bologna, Italy), 115–117.Google Scholar
- (2021) Is there an analog of Nesterov acceleration for MCMC? Bernoulli 27(3):1942–1992.Google Scholar
- (2016) A variational analysis of stochastic gradient algorithms. Proc. 33rd Internat. Conf. Machine Learn. (PMLR, New York).Google Scholar
- (2007) Stochastic Differential Equations and Applications, 2nd ed. (Woodhead Publishing, Chichester, UK).Google Scholar
- (2024) An SDE perspective on stochastic convex optimization. Math. Oper. Res. 50(4):3190–3221.Google Scholar
- (2025) Stochastic differential inclusions and Tikhonov regularization for stochastic non-smooth convex optimization in Hilbert spaces. Open J. Math. Optim. 6, article no.9.Google Scholar
- (2017) Asymptotic for a second order evolution equation with convex potential and vanishing damping term. Turkish J. Math. 41(3):681–685.Google Scholar
- (2018) On the convergence of gradient-like flows with noisy gradient input. SIAM J. Optim. 28(1):163–197.Google Scholar
- (2021) Optimization with momentum: Dynamical, control-theoretic, and symplectic perspectives. J. Machine Learn. Res. 22(73):1–50.Google Scholar
- (1983) A method of solving a convex programming problem with convergence rate O(1/k2). Doklady Akademii Nauk SSSR (Proc. USSR Acad. Sci.) 269(3):543–547. Google Scholar
- (2003) Stochastic Differential Equations, 6th ed. (Springer-Verlag, Berlin).Google Scholar
- (2019) Continuous-time models for stochastic optimization algorithms. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems, vol. 32 (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2020) The role of memory in stochastic optimization. Adams RP, Gogate V, eds. Proc. 35th Conf. Uncertainty Artificial Intelligence, Proceedings of Machine Learning Research, vol. 115 (PMLR, New York), 356–366.Google Scholar
- (2014) Stochastic Processes and Applications: Diffusion Processes, the Fokker-Planck Equation and Large Deviations (Springer, New York). Google Scholar
- (1995) Yosida approximations for multivalued stochastic differential equations. Stochastics Stochastics Rep. 52(1–2):107–120.Google Scholar
- (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Physics 4(5):1–17.Google Scholar
- (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Google Scholar
- (1997) Convex Analysis (Princeton University Press, Princeton, NJ).Google Scholar
- (2011) Convergence rates of inexact proximal-gradient methods for convex optimization. Shawe‑Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Inc., Red Hook, NY).Google Scholar
- (2023) On learning rates and Schrödinger operators. J. Machine Learn. Res. 24(379):1–53.Google Scholar
- (2022) Understanding the acceleration phenomenon via high resolution differential equations. Math. Programming 195:79–148.Google Scholar
- (2018) Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. Choi SP, Yilmaz O, Poor HV, eds. Proc. 2018 Inform. Theory Appl. Workshop (ITA) (IEEE, Piscataway, NJ), 1–10.Google Scholar
- (2016) A differential equation for modeling Nesterov’s accelerated gradient method: Theory and insights. J. Machine Learn. Res. 17(153):1–43.Google Scholar
- (2013) Accelerated and inexact forward-backward. SIAM J. Optim. 23(3):1607–1633.Google Scholar
- (2021) A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima. Proc. Ninth Internat. Conf. Learn. Representations (ICLR) (OpenReview.net).Google Scholar
- (2018) Theoretical analysis for convex and non-convex clustering algorithms Doctoral dissertation, The University of Texas at Austin, Austin.Google Scholar

