Developing Lagrangian-Based Methods for Nonsmooth Nonconvex Optimization
Published Online:8 Jan 2026https://doi.org/10.1287/moor.2024.0479
References
- [1] (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. Kalai AT, Mohri M, eds. Proc. 23rd Conf. Learn. Theory (COLT 2010) (Omnipress, Madison, WI), 28–40.Google Scholar
- [2] (2024) Complexity of single loop algorithms for nonlinear programming with stochastic objective and constraints. Dasgupta S, Mandt S, Li Y, eds. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (AISTATS 2024) (PMLR, New York), 4627–4635.Google Scholar
- [3] (2008) On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4):1286–1309.Crossref, Google Scholar
- [4] (2012) Differential Inclusions: Set-Valued Maps and Viability Theory, vol. 264 (Springer Science & Business Media, New York).Google Scholar
- [5] M, Guyard F (2021) Learning sparse deep neural networks using efficient structured projections on convex constraints for green AI. Boyer K, Lovell BC, Pelillo M, Sebe N, Vidal R, Yu J, eds. Proc. 2020 25th Internat. Conf. Pattern Recognition (ICPR) (IEEE, Piscataway, NJ), 1566–1573. Google Scholar
- [6] (2014) Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB (SIAM, Philadelphia).Crossref, Google Scholar
- [7] (2006) Dynamics of stochastic approximation algorithms. Azéma J, Émery M, Ledoux M, Yor M, eds. Seminaire de Probabilites XXXIII, Lecture Notes in Mathematics, vol. 1709 (Springer, Berlin), 1–68.Google Scholar
- [8] (2005) Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1):328–348.Crossref, Google Scholar
- [9] (2014) Constrained Optimization and Lagrange Multiplier Methods (Academic Press, New York).Google Scholar
- [10] (2021) A closed-measure approach to stochastic approximation. Preprint, submitted December 10, https://arxiv.org/abs/2112.05482.Google Scholar
- [11] (2022) Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. Set-Valued Variational Anal. 30:1117–1147. Crossref, Google Scholar
- [12] (2014) Practical Augmented Lagrangian Methods for Constrained Optimization (SIAM, Philadelphia).Crossref, Google Scholar
- [13] (2018) Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Comput. Optim. Appl. 69:51–75.Crossref, Google Scholar
- [14] (2020) A mathematical model for automatic differentiation in machine learning. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 10809–10819.Google Scholar
- [15] (2021) Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Math. Programming 188(1):19–51.Crossref, Google Scholar
- [16] (2023) Subgradient sampling for nonsmooth nonconvex minimization. SIAM J. Optim. 33(4):2542–2569.Crossref, Google Scholar
- [17] (2024) Long term dynamics of the subgradient method for Lipschitz path differentiable functions. J. Eur. Math. Soc. 26(7):2533–2563.Crossref, Google Scholar
- [18] (2022) Differentiating nonsmooth solutions to parametric monotone inclusion problems. Preprint, submitted December 15, https://arxiv.org/abs/2212.07844.Google Scholar
- [19] (2018) Nonconvex Lagrangian-based optimization: Monitoring schemes and global convergence. Math. Oper. Res. 43(4):1210–1232.Link, Google Scholar
- [20] (2007) Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2):556–572.Crossref, Google Scholar
- [21] (2024) Inexact subgradient methods for semialgebraic functions. Preprint, submitted April 30, https://arxiv.org/abs/2404.19517.Google Scholar
- [22] , Le T, Pauwels E, Silveti-Falls A (2021) Nonsmooth implicit differentiation for machine-learning and optimization. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 13537–13549.Google Scholar
- [23] (2023) Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Programming 197(1):215–279.Crossref, Google Scholar
- [24] (2009) Stochastic Approximation: A Dynamical Systems Viewpoint, vol. 48 (Springer, Singapore).Google Scholar
- [25] (2021) An inertial Newton algorithm for deep learning. J. Machine Learn. Res. 22(1):5977–6007.Google Scholar
- [26] (2021) Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization. IEEE Trans. Signal Processing 69:4937–4948.Crossref, Google Scholar
- [27] (1990) Optimization and Nonsmooth Analysis, vol. 5 (SIAM, Philadelphia).Crossref, Google Scholar
- [28] (2025) Alternating and parallel proximal gradient methods for nonsmooth, nonconvex minimax: A unified convergence analysis. Math. Oper. Res. 50(1):141–168.Link, Google Scholar
- [29] (2022) A dynamic alternating direction of multipliers for nonconvex minimization with nonlinear functional equality constraints. J. Optim. Theory Appl. 193(1):324–353.Crossref, Google Scholar
- [30] (1991) A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 28(2):545–572.Crossref, Google Scholar
- [31] (2015) An adaptive augmented Lagrangian method for large-scale constrained optimization. Math. Programming 152(1–2):201–245.Crossref, Google Scholar
- [32] (2020) Pathological subgradient dynamics. SIAM J. Optim. 30(2):1327–1338.Crossref, Google Scholar
- [33] (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119–154.Crossref, Google Scholar
- [34] (2024) A nonsmooth exact penalty method for equality-constrained optimization: Complexity and implementation. Preprint, submitted October 3, https://arxiv.org/abs/2410.02188.Google Scholar
- [35] (2018) Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4):3229–3259.Crossref, Google Scholar
- [36] (2015) Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Inform. Theory 61(5):2788–2806.Crossref, Google Scholar
- [37] El Bourkhissi LE, Necoara I (2023) Complexity of linearized augmented Lagrangian for optimization with nonlinear equality constraints. Preprint, submitted January 19, https://arxiv.org/abs/2301.08345.Google Scholar
- [38] (2012) Local convergence of exact and inexact augmented Lagrangian methods under the second-order sufficient optimality condition. SIAM J. Optim. 22(2):384–407.Crossref, Google Scholar
- [39] (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.Crossref, Google Scholar
- [40] (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Crossref, Google Scholar
- [41] (2020) A single timescale stochastic approximation method for nested stochastic optimization. SIAM J. Optim. 30(1):960–979.Crossref, Google Scholar
- [42] (2021) On the complexity of an augmented lagrangian method for nonconvex optimization. IMA J. Numer. Anal. 41(2):1546–1568.Crossref, Google Scholar
- [43] (2023) An augmented Lagrangian approach for problems with random matrix composite structure. Preprint, submitted May 1, https://arxiv.org/abs/2305.01055.Google Scholar
- [44] (2019) Stochastic in-face Frank–Wolfe methods for non-convex optimization and sparse neural network training. Preprint, submitted June 9, https://arxiv.org/abs/1906.03580.Google Scholar
- [45] (2022) A stochastic subgradient method for distributionally robust non-convex and non-smooth learning. J. Optim. Theory Appl. 194(3):1014–1041.Crossref, Google Scholar
- [46] (2023) An adaptive Lagrangian-based scheme for nonconvex composite optimization. Math. Oper. Res. 48(4):2337–2352.Abstract, Google Scholar
- [47] , Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Agapito L, Berg T, Kosecka J, Zelnik-Manor L, eds. Proc. IEEE Conf. Comput. Vision Pattern Recognition (CVPR 2016) (IEEE Computer Society, Washington, DC), 770–778.Google Scholar
- [48] (1969) Multiplier and gradient methods. J. Optim. Theory Appl. 4(5):303–320.Crossref, Google Scholar
- [49] (2023) An improved unconstrained approach for bilevel optimization. SIAM J. Optim. 33(4):2801–2829.Crossref, Google Scholar
- [50] (1990) The augmented Lagrangian method for equality and inequality constraints in Hilbert spaces. Math. Programming 46(1):341–360.Crossref, Google Scholar
- [51] (2023) An exact penalty approach for optimization with nonnegative orthogonality constraints. Math. Programming 198(1):855–897.Crossref, Google Scholar
- [52] (2006) Numerical Optimization (Springer, New York).Google Scholar
- [53] (2023) Lyapunov stability of the subgradient method with constant step size. Math. Programming 202(1–2):387–396.Crossref, Google Scholar
- [54] (2024) Global stability of first-order methods for coercive tame functions. Math. Programming 207(1–2):551–576.Crossref, Google Scholar
- [55] (2024) Proximal random reshuffling under local Lipschitz continuity. Preprint, submitted August 13, https://arxiv.org/abs/2408.07182.Google Scholar
- [56] (2015) Adam: A method for stochastic optimization. Proc. 3rd Internat. Conf. Learn. Representations.Google Scholar
- [57] (2023) Iteration complexity of a proximal augmented Lagrangian method for solving nonconvex composite optimization problems with nonlinear convex constraints. Math. Oper. Res. 48(2):1066–1094.Link, Google Scholar
- [58] Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto.Google Scholar
- [59] (2023) Nonsmooth nonconvex stochastic heavy ball. Preprint, submitted April 26, https://arxiv.org/abs/2304.13328.Google Scholar
- [60] LeCun Y (1998) The MNIST database of handwritten digits. Accessed April 1, 2024, http://yann.lecun.com/exdb/mnist/.Google Scholar
- [61] (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.Crossref, Google Scholar
- [62] (2021) Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2170–2178.Google Scholar
- [63] (2024) Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization. Comput. Optim. Appl. 87(1):117–147.Crossref, Google Scholar
- [64] (2022) Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Comput. Optim. Appl. 82(1):175–224.Crossref, Google Scholar
- [65] (2022) Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 26160–26175.Google Scholar
- [66] (2020) Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. Internat. Conf. Machine Learn. (PMLR, New York), 6554–6564.Google Scholar
- [67] (2017) Random gradient-free minimization of convex functions. Foundations Comput. Math. 17(2):527–566.Crossref, Google Scholar
- [68] (2023) Stochastic Lagrangian-based method for nonconvex optimization with nonlinear constraints. Preprint, submitted September 1, https://doi.org/10.21203/rs.3.rs-3302999/v1.Google Scholar
- [69] (2020) Deep neural network training with Frank–Wolfe. Preprint, submitted October 14, https://arxiv.org/abs/2010.07243.Google Scholar
- [70] (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
- [71] (1969) A method for nonlinear constraints in minimization problems. Fletcher R, ed. Optimization (Academic Press, New York), 283–298.Google Scholar
- [72] (1976) Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2):97–116.Link, Google Scholar
- [73] (2020) Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization. Optim. Lett. 14(7):1615–1625.Crossref, Google Scholar
- [74] (2019) Lagrangian methods for composite optimization. Kimmel R, Tai X-C, eds. Handbook of Numerical Analysis, vol. 20 (Elsevier, Amsterdam), 401–436.Google Scholar
- [75] (2022) Faster Lagrangian-based methods in convex optimization. SIAM J. Optim. 32(1):204–227.Crossref, Google Scholar
- [76] (2019) An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY).Google Scholar
- [77] (2017) An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Machine Learn. Res. 18(52):1–11.Google Scholar
- [78] (2025) A momentum-based linearized augmented lagrangian method for nonconvex constrained stochastic optimization. Math. Oper. Res., ePub ahead of print January 23, https://doi.org/10.1287/moor.2022.0193.Link, Google Scholar
- [79] (2024) Self-adaptive ADMM for semi-strongly convex problems. Math. Programming Comput. 16(1):113–150.Crossref, Google Scholar
- [80] (2024) Solving graph equipartition SDPs on an algebraic variety. Math. Programming 204(1):299–347.Crossref, Google Scholar
- [81] (1996) Geometric categories and o-minimal structures. Duke Math. J. 84(2):497–540.Crossref, Google Scholar
- [82] (2023) Strong variational sufficiency for nonlinear semidefinite programming and its implications. SIAM J. Optim. 33(4):2988–3011.Crossref, Google Scholar
- [83] (2023) Convergence guarantees for stochastic subgradient methods in nonsmooth nonconvex optimization. Preprint, submitted July 19, https://arxiv.org/abs/2307.10053.Google Scholar
- [84] (2024) Dissolving constraints for Riemannian optimization. Math. Oper. Res. 49(1):366–397.Link, Google Scholar
- [85] (2024) Adam-family methods for nonsmooth optimization with convergence guarantees. J. Machine Learn. Res. 25(48):1–53.Google Scholar
- [86] (2021) Complexity of proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints. J. Sci. Comput. 86(3):38.Crossref, Google Scholar
- [87] (2023) A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems. Math. Programming 201(1–2):635–706.Crossref, Google Scholar
- [88] (2025) Data-driven minimax optimization with expectation constraints. Oper. Res. 73(3):1345–1365.Link, Google Scholar
- [89] (2024) A first-order primal-dual method for nonconvex constrained optimization based on the augmented Lagrangian. Math. Oper. Res. 49(1):125–150.Link, Google Scholar

