Department of Mathematics, National University of Singapore, Singapore 119076; and Institute of Operations Research and Analytics, National University of Singapore, Singapore 119076

Search for more papers by this author

Published Online:8 Jan 2026https://doi.org/10.1287/moor.2024.0479

References

[1] Agarwal A, Dekel O, Xiao L (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. Kalai AT, Mohri M, eds. Proc. 23rd Conf. Learn. Theory (COLT 2010) (Omnipress, Madison, WI), 28–40.Google Scholar
[2] Alacaoglu A, Wright SJ (2024) Complexity of single loop algorithms for nonlinear programming with stochastic objective and constraints. Dasgupta S, Mandt S, Li Y, eds. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (AISTATS 2024) (PMLR, New York), 4627–4635.Google Scholar
[3] Andreani R, Birgin EG, Martínez JM, Schuverdt ML (2008) On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4):1286–1309.Crossref, Google Scholar
[4] Aubin JP, Cellina A (2012) Differential Inclusions: Set-Valued Maps and Viability Theory, vol. 264 (Springer Science & Business Media, New York).Google Scholar
[5] Barlaud M, Guyard F (2021) Learning sparse deep neural networks using efficient structured projections on convex constraints for green AI. Boyer K, Lovell BC, Pelillo M, Sebe N, Vidal R, Yu J, eds. Proc. 2020 25th Internat. Conf. Pattern Recognition (ICPR) (IEEE, Piscataway, NJ), 1566–1573. Google Scholar
[6] Beck A (2014) Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB (SIAM, Philadelphia).Crossref, Google Scholar
[7] Benaïm M (2006) Dynamics of stochastic approximation algorithms. Azéma J, Émery M, Ledoux M, Yor M, eds. Seminaire de Probabilites XXXIII, Lecture Notes in Mathematics, vol. 1709 (Springer, Berlin), 1–68.Google Scholar
[8] Benaïm M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1):328–348.Crossref, Google Scholar
[9] Bertsekas DP (2014) Constrained Optimization and Lagrange Multiplier Methods (Academic Press, New York).Google Scholar
[10] Bianchi P, Rios-Zertuche R (2021) A closed-measure approach to stochastic approximation. Preprint, submitted December 10, https://arxiv.org/abs/2112.05482.Google Scholar
[11] Bianchi P, Hachem W, Schechtman S (2022) Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. Set-Valued Variational Anal. 30:1117–1147. Crossref, Google Scholar
[12] Birgin EG, Martínez JM (2014) Practical Augmented Lagrangian Methods for Constrained Optimization (SIAM, Philadelphia).Crossref, Google Scholar
[13] Birgin EG, Haeser G, Ramos A (2018) Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Comput. Optim. Appl. 69:51–75.Crossref, Google Scholar
[14] Bolte J, Pauwels E (2020) A mathematical model for automatic differentiation in machine learning. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 10809–10819.Google Scholar
[15] Bolte J, Pauwels E (2021) Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Math. Programming 188(1):19–51.Crossref, Google Scholar
[16] Bolte J, Le T, Pauwels E (2023) Subgradient sampling for nonsmooth nonconvex minimization. SIAM J. Optim. 33(4):2542–2569.Crossref, Google Scholar
[17] Bolte J, Pauwels E, Ríos-Zertuche R (2024) Long term dynamics of the subgradient method for Lipschitz path differentiable functions. J. Eur. Math. Soc. 26(7):2533–2563.Crossref, Google Scholar
[18] Bolte J, Pauwels E, Silveti-Falls AJ (2022) Differentiating nonsmooth solutions to parametric monotone inclusion problems. Preprint, submitted December 15, https://arxiv.org/abs/2212.07844.Google Scholar
[19] Bolte J, Sabach S, Teboulle M (2018) Nonconvex Lagrangian-based optimization: Monitoring schemes and global convergence. Math. Oper. Res. 43(4):1210–1232.Link, Google Scholar
[20] Bolte J, Daniilidis A, Lewis A, Shiota M (2007) Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2):556–572.Crossref, Google Scholar
[21] Bolte J, Le T, Moulines E, Pauwels E (2024) Inexact subgradient methods for semialgebraic functions. Preprint, submitted April 30, https://arxiv.org/abs/2404.19517.Google Scholar
[22] Bolte J, Le T, Pauwels E, Silveti-Falls A (2021) Nonsmooth implicit differentiation for machine-learning and optimization. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 13537–13549.Google Scholar
[23] Boob D, Deng Q, Lan G (2023) Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Programming 197(1):215–279.Crossref, Google Scholar
[24] Borkar VS (2009) Stochastic Approximation: A Dynamical Systems Viewpoint, vol. 48 (Springer, Singapore).Google Scholar
[25] Castera C, Bolte J, Févotte C, Pauwels E (2021) An inertial Newton algorithm for deep learning. J. Machine Learn. Res. 22(1):5977–6007.Google Scholar
[26] Chen T, Sun Y, Yin W (2021) Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization. IEEE Trans. Signal Processing 69:4937–4948.Crossref, Google Scholar
[27] Clarke FH (1990) Optimization and Nonsmooth Analysis, vol. 5 (SIAM, Philadelphia).Crossref, Google Scholar
[28] Cohen E, Teboulle M (2025) Alternating and parallel proximal gradient methods for nonsmooth, nonconvex minimax: A unified convergence analysis. Math. Oper. Res. 50(1):141–168.Link, Google Scholar
[29] Cohen E, Hallak N, Teboulle M (2022) A dynamic alternating direction of multipliers for nonconvex minimization with nonlinear functional equality constraints. J. Optim. Theory Appl. 193(1):324–353.Crossref, Google Scholar
[30] Conn AR, Gould NI, Toint P (1991) A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 28(2):545–572.Crossref, Google Scholar
[31] Curtis FE, Jiang H, Robinson DP (2015) An adaptive augmented Lagrangian method for large-scale constrained optimization. Math. Programming 152(1–2):201–245.Crossref, Google Scholar
[32] Daniilidis A, Drusvyatskiy D (2020) Pathological subgradient dynamics. SIAM J. Optim. 30(2):1327–1338.Crossref, Google Scholar
[33] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119–154.Crossref, Google Scholar
[34] Diouane Y, Gollier M, Orban D (2024) A nonsmooth exact penalty method for equality-constrained optimization: Complexity and implementation. Preprint, submitted October 3, https://arxiv.org/abs/2410.02188.Google Scholar
[35] Duchi JC, Ruan F (2018) Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4):3229–3259.Crossref, Google Scholar
[36] Duchi JC, Jordan MI, Wainwright MJ, Wibisono A (2015) Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Inform. Theory 61(5):2788–2806.Crossref, Google Scholar
[37] El Bourkhissi LE, Necoara I (2023) Complexity of linearized augmented Lagrangian for optimization with nonlinear equality constraints. Preprint, submitted January 19, https://arxiv.org/abs/2301.08345.Google Scholar
[38] Fernández D, Solodov MV (2012) Local convergence of exact and inexact augmented Lagrangian methods under the second-order sufficient optimality condition. SIAM J. Optim. 22(2):384–407.Crossref, Google Scholar
[39] Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.Crossref, Google Scholar
[40] Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Crossref, Google Scholar
[41] Ghadimi S, Ruszczyński A, Wang M (2020) A single timescale stochastic approximation method for nested stochastic optimization. SIAM J. Optim. 30(1):960–979.Crossref, Google Scholar
[42] Grapiglia GN, Yuan Y (2021) On the complexity of an augmented lagrangian method for nonconvex optimization. IMA J. Numer. Anal. 41(2):1546–1568.Crossref, Google Scholar
[43] Greenstein D, Hallak N (2023) An augmented Lagrangian approach for problems with random matrix composite structure. Preprint, submitted May 1, https://arxiv.org/abs/2305.01055.Google Scholar
[44] Grigas P, Lobos A, Vermeersch N (2019) Stochastic in-face Frank–Wolfe methods for non-convex optimization and sparse neural network training. Preprint, submitted June 9, https://arxiv.org/abs/1906.03580.Google Scholar
[45] Gürbüzbalaban M, Ruszczyński A, Zhu L (2022) A stochastic subgradient method for distributionally robust non-convex and non-smooth learning. J. Optim. Theory Appl. 194(3):1014–1041.Crossref, Google Scholar
[46] Hallak N, Teboulle M (2023) An adaptive Lagrangian-based scheme for nonconvex composite optimization. Math. Oper. Res. 48(4):2337–2352.Abstract, Google Scholar
[47] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Agapito L, Berg T, Kosecka J, Zelnik-Manor L, eds. Proc. IEEE Conf. Comput. Vision Pattern Recognition (CVPR 2016) (IEEE Computer Society, Washington, DC), 770–778.Google Scholar
[48] Hestenes MR (1969) Multiplier and gradient methods. J. Optim. Theory Appl. 4(5):303–320.Crossref, Google Scholar
[49] Hu X, Xiao N, Liu X, Toh KC (2023) An improved unconstrained approach for bilevel optimization. SIAM J. Optim. 33(4):2801–2829.Crossref, Google Scholar
[50] Ito K, Kunisch K (1990) The augmented Lagrangian method for equality and inequality constraints in Hilbert spaces. Math. Programming 46(1):341–360.Crossref, Google Scholar
[51] Jiang B, Meng X, Wen Z, Chen X (2023) An exact penalty approach for optimization with nonnegative orthogonality constraints. Math. Programming 198(1):855–897.Crossref, Google Scholar
[52] Jorge N, Stephen JW (2006) Numerical Optimization (Springer, New York).Google Scholar
[53] Josz C, Lai L (2023) Lyapunov stability of the subgradient method with constant step size. Math. Programming 202(1–2):387–396.Crossref, Google Scholar
[54] Josz C, Lai L (2024) Global stability of first-order methods for coercive tame functions. Math. Programming 207(1–2):551–576.Crossref, Google Scholar
[55] Josz C, Lai L, Li X (2024) Proximal random reshuffling under local Lipschitz continuity. Preprint, submitted August 13, https://arxiv.org/abs/2408.07182.Google Scholar
[56] Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Proc. 3rd Internat. Conf. Learn. Representations.Google Scholar
[57] Kong W, Melo JG, Monteiro RD (2023) Iteration complexity of a proximal augmented Lagrangian method for solving nonconvex composite optimization problems with nonlinear convex constraints. Math. Oper. Res. 48(2):1066–1094.Link, Google Scholar
[58] Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto.Google Scholar
[59] Le T (2023) Nonsmooth nonconvex stochastic heavy ball. Preprint, submitted April 26, https://arxiv.org/abs/2304.13328.Google Scholar
[60] LeCun Y (1998) The MNIST database of handwritten digits. Accessed April 1, 2024, http://yann.lecun.com/exdb/mnist/.Google Scholar
[61] LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.Crossref, Google Scholar
[62] Li Z, Chen PY, Liu S, Lu S, Xu Y (2021) Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2170–2178.Google Scholar
[63] Li Z, Chen P-Y, Liu S, Lu S, Xu Y (2024) Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization. Comput. Optim. Appl. 87(1):117–147.Crossref, Google Scholar
[64] Lin Q, Ma R, Xu Y (2022) Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Comput. Optim. Appl. 82(1):175–224.Crossref, Google Scholar
[65] Lin T, Zheng Z, Jordan M (2022) Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 26160–26175.Google Scholar
[66] Ma R, Lin Q, Yang T (2020) Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. Internat. Conf. Machine Learn. (PMLR, New York), 6554–6564.Google Scholar
[67] Nesterov Y, Spokoiny V (2017) Random gradient-free minimization of convex functions. Foundations Comput. Math. 17(2):527–566.Crossref, Google Scholar
[68] Papadimitriou D, Vu B (2023) Stochastic Lagrangian-based method for nonconvex optimization with nonlinear constraints. Preprint, submitted September 1, https://doi.org/10.21203/rs.3.rs-3302999/v1.Google Scholar
[69] Pokutta S, Spiegel C, Zimmer M (2020) Deep neural network training with Frank–Wolfe. Preprint, submitted October 14, https://arxiv.org/abs/2010.07243.Google Scholar
[70] Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.Crossref, Google Scholar
[71] Powell MJD (1969) A method for nonlinear constraints in minimization problems. Fletcher R, ed. Optimization (Academic Press, New York), 283–298.Google Scholar
[72] Rockafellar RT (1976) Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2):97–116.Link, Google Scholar
[73] Ruszczyński A (2020) Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization. Optim. Lett. 14(7):1615–1625.Crossref, Google Scholar
[74] Sabach S, Teboulle M (2019) Lagrangian methods for composite optimization. Kimmel R, Tai X-C, eds. Handbook of Numerical Analysis, vol. 20 (Elsevier, Amsterdam), 401–436.Google Scholar
[75] Sabach S, Teboulle M (2022) Faster Lagrangian-based methods in convex optimization. SIAM J. Optim. 32(1):204–227.Crossref, Google Scholar
[76] Sahin MF, Eftekhari A, Alacaoglu A, Latorre F, Cevher V (2019) An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY).Google Scholar
[77] Shamir O (2017) An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Machine Learn. Res. 18(52):1–11.Google Scholar
[78] Shi Q, Wang X, Wang H (2025) A momentum-based linearized augmented lagrangian method for nonconvex constrained stochastic optimization. Math. Oper. Res., ePub ahead of print January 23, https://doi.org/10.1287/moor.2022.0193.Link, Google Scholar
[79] Tang T, Toh K-C (2024) Self-adaptive ADMM for semi-strongly convex problems. Math. Programming Comput. 16(1):113–150.Crossref, Google Scholar
[80] Tang T, Toh KC (2024) Solving graph equipartition SDPs on an algebraic variety. Math. Programming 204(1):299–347.Crossref, Google Scholar
[81] van den Dries L, Miller C (1996) Geometric categories and o-minimal structures. Duke Math. J. 84(2):497–540.Crossref, Google Scholar
[82] Wang S, Ding C, Zhang Y, Zhao X (2023) Strong variational sufficiency for nonlinear semidefinite programming and its implications. SIAM J. Optim. 33(4):2988–3011.Crossref, Google Scholar
[83] Xiao N, Hu X, Toh KC (2023) Convergence guarantees for stochastic subgradient methods in nonsmooth nonconvex optimization. Preprint, submitted July 19, https://arxiv.org/abs/2307.10053.Google Scholar
[84] Xiao N, Liu X, Toh K-C (2024) Dissolving constraints for Riemannian optimization. Math. Oper. Res. 49(1):366–397.Link, Google Scholar
[85] Xiao N, Hu X, Liu X, Toh KC (2024) Adam-family methods for nonsmooth optimization with convergence guarantees. J. Machine Learn. Res. 25(48):1–53.Google Scholar
[86] Xie Y, Wright SJ (2021) Complexity of proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints. J. Sci. Comput. 86(3):38.Crossref, Google Scholar
[87] Xu Z, Zhang H, Xu Y, Lan G (2023) A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems. Math. Programming 201(1–2):635–706.Crossref, Google Scholar
[88] Yang S, Li X, Lan G (2025) Data-driven minimax optimization with expectation constraints. Oper. Res. 73(3):1345–1365.Link, Google Scholar
[89] Zhu D, Zhao L, Zhang S (2024) A first-order primal-dual method for nonconvex constrained optimization based on the augmented Lagrangian. Math. Oper. Res. 49(1):125–150.Link, Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:April 14, 2024
Accepted:December 06, 2025
Published Online:January 08, 2026

Cite as

Nachuan Xiao, Kuangyu Ding, Xiaoyin Hu, Kim-Chuan Toh (2026) Developing Lagrangian-Based Methods for Nonsmooth Nonconvex Optimization. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0479

Keywords

Acknowledgments

The authors to thank the reviewers and editors for their valuable suggestions that have helped to improve this paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Developing Lagrangian-Based Methods for Nonsmooth Nonconvex Optimization

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News