Developing Lagrangian-Based Methods for Nonsmooth Nonconvex Optimization

Published Online:https://doi.org/10.1287/moor.2024.0479

References

  • [1] Agarwal A, Dekel O, Xiao L (2010) Optimal algorithms for online convex optimization with multi-point bandit feedback. Kalai AT, Mohri M, eds. Proc. 23rd Conf. Learn. Theory (COLT 2010) (Omnipress, Madison, WI), 28–40.Google Scholar
  • [2] Alacaoglu A, Wright SJ (2024) Complexity of single loop algorithms for nonlinear programming with stochastic objective and constraints. Dasgupta S, Mandt S, Li Y, eds. Proc. 27th Internat. Conf. Artificial Intelligence Statist. (AISTATS 2024) (PMLR, New York), 4627–4635.Google Scholar
  • [3] Andreani R, Birgin EG, Martínez JM, Schuverdt ML (2008) On augmented Lagrangian methods with general lower-level constraints. SIAM J. Optim. 18(4):1286–1309.CrossrefGoogle Scholar
  • [4] Aubin JP, Cellina A (2012) Differential Inclusions: Set-Valued Maps and Viability Theory, vol. 264 (Springer Science & Business Media, New York).Google Scholar
  • [5] Barlaud M, Guyard F (2021) Learning sparse deep neural networks using efficient structured projections on convex constraints for green AI. Boyer K, Lovell BC, Pelillo M, Sebe N, Vidal R, Yu J, eds. Proc. 2020 25th Internat. Conf. Pattern Recognition (ICPR) (IEEE, Piscataway, NJ), 1566–1573. Google Scholar
  • [6] Beck A (2014) Introduction to Nonlinear Optimization: Theory, Algorithms, and Applications with MATLAB (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [7] Benaïm M (2006) Dynamics of stochastic approximation algorithms. Azéma J, Émery M, Ledoux M, Yor M, eds. Seminaire de Probabilites XXXIII, Lecture Notes in Mathematics, vol. 1709 (Springer, Berlin), 1–68.Google Scholar
  • [8] Benaïm M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1):328–348.CrossrefGoogle Scholar
  • [9] Bertsekas DP (2014) Constrained Optimization and Lagrange Multiplier Methods (Academic Press, New York).Google Scholar
  • [10] Bianchi P, Rios-Zertuche R (2021) A closed-measure approach to stochastic approximation. Preprint, submitted December 10, https://arxiv.org/abs/2112.05482.Google Scholar
  • [11] Bianchi P, Hachem W, Schechtman S (2022) Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. Set-Valued Variational Anal. 30:1117–1147. CrossrefGoogle Scholar
  • [12] Birgin EG, Martínez JM (2014) Practical Augmented Lagrangian Methods for Constrained Optimization (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [13] Birgin EG, Haeser G, Ramos A (2018) Augmented Lagrangians with constrained subproblems and convergence to second-order stationary points. Comput. Optim. Appl. 69:51–75.CrossrefGoogle Scholar
  • [14] Bolte J, Pauwels E (2020) A mathematical model for automatic differentiation in machine learning. Adv. Neural Inform. Processing Systems, vol. 33 (Curran Associates, Inc., Red Hook, NY), 10809–10819.Google Scholar
  • [15] Bolte J, Pauwels E (2021) Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Math. Programming 188(1):19–51.CrossrefGoogle Scholar
  • [16] Bolte J, Le T, Pauwels E (2023) Subgradient sampling for nonsmooth nonconvex minimization. SIAM J. Optim. 33(4):2542–2569.CrossrefGoogle Scholar
  • [17] Bolte J, Pauwels E, Ríos-Zertuche R (2024) Long term dynamics of the subgradient method for Lipschitz path differentiable functions. J. Eur. Math. Soc. 26(7):2533–2563.CrossrefGoogle Scholar
  • [18] Bolte J, Pauwels E, Silveti-Falls AJ (2022) Differentiating nonsmooth solutions to parametric monotone inclusion problems. Preprint, submitted December 15, https://arxiv.org/abs/2212.07844.Google Scholar
  • [19] Bolte J, Sabach S, Teboulle M (2018) Nonconvex Lagrangian-based optimization: Monitoring schemes and global convergence. Math. Oper. Res. 43(4):1210–1232.LinkGoogle Scholar
  • [20] Bolte J, Daniilidis A, Lewis A, Shiota M (2007) Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2):556–572.CrossrefGoogle Scholar
  • [21] Bolte J, Le T, Moulines E, Pauwels E (2024) Inexact subgradient methods for semialgebraic functions. Preprint, submitted April 30, https://arxiv.org/abs/2404.19517.Google Scholar
  • [22] Bolte J, Le T, Pauwels E, Silveti-Falls A (2021) Nonsmooth implicit differentiation for machine-learning and optimization. Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Wortman Vaughan J, eds. Adv. Neural Inform. Processing Systems, vol. 34 (Curran Associates Inc., Red Hook, NY), 13537–13549.Google Scholar
  • [23] Boob D, Deng Q, Lan G (2023) Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Programming 197(1):215–279.CrossrefGoogle Scholar
  • [24] Borkar VS (2009) Stochastic Approximation: A Dynamical Systems Viewpoint, vol. 48 (Springer, Singapore).Google Scholar
  • [25] Castera C, Bolte J, Févotte C, Pauwels E (2021) An inertial Newton algorithm for deep learning. J. Machine Learn. Res. 22(1):5977–6007.Google Scholar
  • [26] Chen T, Sun Y, Yin W (2021) Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization. IEEE Trans. Signal Processing 69:4937–4948.CrossrefGoogle Scholar
  • [27] Clarke FH (1990) Optimization and Nonsmooth Analysis, vol. 5 (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [28] Cohen E, Teboulle M (2025) Alternating and parallel proximal gradient methods for nonsmooth, nonconvex minimax: A unified convergence analysis. Math. Oper. Res. 50(1):141–168.LinkGoogle Scholar
  • [29] Cohen E, Hallak N, Teboulle M (2022) A dynamic alternating direction of multipliers for nonconvex minimization with nonlinear functional equality constraints. J. Optim. Theory Appl. 193(1):324–353.CrossrefGoogle Scholar
  • [30] Conn AR, Gould NI, Toint P (1991) A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 28(2):545–572.CrossrefGoogle Scholar
  • [31] Curtis FE, Jiang H, Robinson DP (2015) An adaptive augmented Lagrangian method for large-scale constrained optimization. Math. Programming 152(1–2):201–245.CrossrefGoogle Scholar
  • [32] Daniilidis A, Drusvyatskiy D (2020) Pathological subgradient dynamics. SIAM J. Optim. 30(2):1327–1338.CrossrefGoogle Scholar
  • [33] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119–154.CrossrefGoogle Scholar
  • [34] Diouane Y, Gollier M, Orban D (2024) A nonsmooth exact penalty method for equality-constrained optimization: Complexity and implementation. Preprint, submitted October 3, https://arxiv.org/abs/2410.02188.Google Scholar
  • [35] Duchi JC, Ruan F (2018) Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4):3229–3259.CrossrefGoogle Scholar
  • [36] Duchi JC, Jordan MI, Wainwright MJ, Wibisono A (2015) Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Trans. Inform. Theory 61(5):2788–2806.CrossrefGoogle Scholar
  • [37] El Bourkhissi LE, Necoara I (2023) Complexity of linearized augmented Lagrangian for optimization with nonlinear equality constraints. Preprint, submitted January 19, https://arxiv.org/abs/2301.08345.Google Scholar
  • [38] Fernández D, Solodov MV (2012) Local convergence of exact and inexact augmented Lagrangian methods under the second-order sufficient optimality condition. SIAM J. Optim. 22(2):384–407.CrossrefGoogle Scholar
  • [39] Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res. Logist. Quart. 3(1–2):95–110.CrossrefGoogle Scholar
  • [40] Ghadimi S, Lan G (2013) Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.CrossrefGoogle Scholar
  • [41] Ghadimi S, Ruszczyński A, Wang M (2020) A single timescale stochastic approximation method for nested stochastic optimization. SIAM J. Optim. 30(1):960–979.CrossrefGoogle Scholar
  • [42] Grapiglia GN, Yuan Y (2021) On the complexity of an augmented lagrangian method for nonconvex optimization. IMA J. Numer. Anal. 41(2):1546–1568.CrossrefGoogle Scholar
  • [43] Greenstein D, Hallak N (2023) An augmented Lagrangian approach for problems with random matrix composite structure. Preprint, submitted May 1, https://arxiv.org/abs/2305.01055.Google Scholar
  • [44] Grigas P, Lobos A, Vermeersch N (2019) Stochastic in-face Frank–Wolfe methods for non-convex optimization and sparse neural network training. Preprint, submitted June 9, https://arxiv.org/abs/1906.03580.Google Scholar
  • [45] Gürbüzbalaban M, Ruszczyński A, Zhu L (2022) A stochastic subgradient method for distributionally robust non-convex and non-smooth learning. J. Optim. Theory Appl. 194(3):1014–1041.CrossrefGoogle Scholar
  • [46] Hallak N, Teboulle M (2023) An adaptive Lagrangian-based scheme for nonconvex composite optimization. Math. Oper. Res. 48(4):2337–2352.AbstractGoogle Scholar
  • [47] He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Agapito L, Berg T, Kosecka J, Zelnik-Manor L, eds. Proc. IEEE Conf. Comput. Vision Pattern Recognition (CVPR 2016) (IEEE Computer Society, Washington, DC), 770–778.Google Scholar
  • [48] Hestenes MR (1969) Multiplier and gradient methods. J. Optim. Theory Appl. 4(5):303–320.CrossrefGoogle Scholar
  • [49] Hu X, Xiao N, Liu X, Toh KC (2023) An improved unconstrained approach for bilevel optimization. SIAM J. Optim. 33(4):2801–2829.CrossrefGoogle Scholar
  • [50] Ito K, Kunisch K (1990) The augmented Lagrangian method for equality and inequality constraints in Hilbert spaces. Math. Programming 46(1):341–360.CrossrefGoogle Scholar
  • [51] Jiang B, Meng X, Wen Z, Chen X (2023) An exact penalty approach for optimization with nonnegative orthogonality constraints. Math. Programming 198(1):855–897.CrossrefGoogle Scholar
  • [52] Jorge N, Stephen JW (2006) Numerical Optimization (Springer, New York).Google Scholar
  • [53] Josz C, Lai L (2023) Lyapunov stability of the subgradient method with constant step size. Math. Programming 202(1–2):387–396.CrossrefGoogle Scholar
  • [54] Josz C, Lai L (2024) Global stability of first-order methods for coercive tame functions. Math. Programming 207(1–2):551–576.CrossrefGoogle Scholar
  • [55] Josz C, Lai L, Li X (2024) Proximal random reshuffling under local Lipschitz continuity. Preprint, submitted August 13, https://arxiv.org/abs/2408.07182.Google Scholar
  • [56] Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. Proc. 3rd Internat. Conf. Learn. Representations.Google Scholar
  • [57] Kong W, Melo JG, Monteiro RD (2023) Iteration complexity of a proximal augmented Lagrangian method for solving nonconvex composite optimization problems with nonlinear convex constraints. Math. Oper. Res. 48(2):1066–1094.LinkGoogle Scholar
  • [58] Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Technical report, University of Toronto, Toronto.Google Scholar
  • [59] Le T (2023) Nonsmooth nonconvex stochastic heavy ball. Preprint, submitted April 26, https://arxiv.org/abs/2304.13328.Google Scholar
  • [60] LeCun Y (1998) The MNIST database of handwritten digits. Accessed April 1, 2024, http://yann.lecun.com/exdb/mnist/.Google Scholar
  • [61] LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc. IEEE 86(11):2278–2324.CrossrefGoogle Scholar
  • [62] Li Z, Chen PY, Liu S, Lu S, Xu Y (2021) Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 2170–2178.Google Scholar
  • [63] Li Z, Chen P-Y, Liu S, Lu S, Xu Y (2024) Stochastic inexact augmented Lagrangian method for nonconvex expectation constrained optimization. Comput. Optim. Appl. 87(1):117–147.CrossrefGoogle Scholar
  • [64] Lin Q, Ma R, Xu Y (2022) Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Comput. Optim. Appl. 82(1):175–224.CrossrefGoogle Scholar
  • [65] Lin T, Zheng Z, Jordan M (2022) Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization. Koyejo S, Mohamed S, Agarwal A, Belgrave D, Cho K, Oh A, eds. Adv. Neural Inform. Processing Systems, vol. 35 (Curran Associates Inc., Red Hook, NY), 26160–26175.Google Scholar
  • [66] Ma R, Lin Q, Yang T (2020) Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. Internat. Conf. Machine Learn. (PMLR, New York), 6554–6564.Google Scholar
  • [67] Nesterov Y, Spokoiny V (2017) Random gradient-free minimization of convex functions. Foundations Comput. Math. 17(2):527–566.CrossrefGoogle Scholar
  • [68] Papadimitriou D, Vu B (2023) Stochastic Lagrangian-based method for nonconvex optimization with nonlinear constraints. Preprint, submitted September 1, https://doi.org/10.21203/rs.3.rs-3302999/v1.Google Scholar
  • [69] Pokutta S, Spiegel C, Zimmer M (2020) Deep neural network training with Frank–Wolfe. Preprint, submitted October 14, https://arxiv.org/abs/2010.07243.Google Scholar
  • [70] Polyak BT (1964) Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 4(5):1–17.CrossrefGoogle Scholar
  • [71] Powell MJD (1969) A method for nonlinear constraints in minimization problems. Fletcher R, ed. Optimization (Academic Press, New York), 283–298.Google Scholar
  • [72] Rockafellar RT (1976) Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2):97–116.LinkGoogle Scholar
  • [73] Ruszczyński A (2020) Convergence of a stochastic subgradient method with averaging for nonsmooth nonconvex constrained optimization. Optim. Lett. 14(7):1615–1625.CrossrefGoogle Scholar
  • [74] Sabach S, Teboulle M (2019) Lagrangian methods for composite optimization. Kimmel R, Tai X-C, eds. Handbook of Numerical Analysis, vol. 20 (Elsevier, Amsterdam), 401–436.Google Scholar
  • [75] Sabach S, Teboulle M (2022) Faster Lagrangian-based methods in convex optimization. SIAM J. Optim. 32(1):204–227.CrossrefGoogle Scholar
  • [76] Sahin MF, Eftekhari A, Alacaoglu A, Latorre F, Cevher V (2019) An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. Adv. Neural Inform. Processing Systems, vol. 32 (Curran Associates Inc., Red Hook, NY).Google Scholar
  • [77] Shamir O (2017) An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. J. Machine Learn. Res. 18(52):1–11.Google Scholar
  • [78] Shi Q, Wang X, Wang H (2025) A momentum-based linearized augmented lagrangian method for nonconvex constrained stochastic optimization. Math. Oper. Res., ePub ahead of print January 23, https://doi.org/10.1287/moor.2022.0193.LinkGoogle Scholar
  • [79] Tang T, Toh K-C (2024) Self-adaptive ADMM for semi-strongly convex problems. Math. Programming Comput. 16(1):113–150.CrossrefGoogle Scholar
  • [80] Tang T, Toh KC (2024) Solving graph equipartition SDPs on an algebraic variety. Math. Programming 204(1):299–347.CrossrefGoogle Scholar
  • [81] van den Dries L, Miller C (1996) Geometric categories and o-minimal structures. Duke Math. J. 84(2):497–540.CrossrefGoogle Scholar
  • [82] Wang S, Ding C, Zhang Y, Zhao X (2023) Strong variational sufficiency for nonlinear semidefinite programming and its implications. SIAM J. Optim. 33(4):2988–3011.CrossrefGoogle Scholar
  • [83] Xiao N, Hu X, Toh KC (2023) Convergence guarantees for stochastic subgradient methods in nonsmooth nonconvex optimization. Preprint, submitted July 19, https://arxiv.org/abs/2307.10053.Google Scholar
  • [84] Xiao N, Liu X, Toh K-C (2024) Dissolving constraints for Riemannian optimization. Math. Oper. Res. 49(1):366–397.LinkGoogle Scholar
  • [85] Xiao N, Hu X, Liu X, Toh KC (2024) Adam-family methods for nonsmooth optimization with convergence guarantees. J. Machine Learn. Res. 25(48):1–53.Google Scholar
  • [86] Xie Y, Wright SJ (2021) Complexity of proximal augmented Lagrangian for nonconvex optimization with nonlinear equality constraints. J. Sci. Comput. 86(3):38.CrossrefGoogle Scholar
  • [87] Xu Z, Zhang H, Xu Y, Lan G (2023) A unified single-loop alternating gradient projection algorithm for nonconvex–concave and convex–nonconcave minimax problems. Math. Programming 201(1–2):635–706.CrossrefGoogle Scholar
  • [88] Yang S, Li X, Lan G (2025) Data-driven minimax optimization with expectation constraints. Oper. Res. 73(3):1345–1365.LinkGoogle Scholar
  • [89] Zhu D, Zhao L, Zhang S (2024) A first-order primal-dual method for nonconvex constrained optimization based on the augmented Lagrangian. Math. Oper. Res. 49(1):125–150.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.