A Momentum-Based Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Optimization

Qiankun Shi
Qiankun Shi
[email protected]
School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China; and Department of AI Computing, Pengcheng Laboratory, Shenzhen 518000, China
Search for more papers by this author
,
Xiao Wang
Corresponding Author
Xiao Wang
[email protected]
https://orcid.org/0000-0002-3492-9235
Department of AI Computing, Pengcheng Laboratory, Shenzhen 518000, China
Search for more papers by this author
,
Hao Wang
Hao Wang
[email protected]
https://orcid.org/0000-0001-8821-7260
School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
Search for more papers by this author

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510006, China; and Department of AI Computing, Pengcheng Laboratory, Shenzhen 518000, China

Search for more papers by this author

Xiao Wang

Corresponding Author

Xiao Wang

[email protected]

https://orcid.org/0000-0002-3492-9235

Department of AI Computing, Pengcheng Laboratory, Shenzhen 518000, China

Search for more papers by this author

Hao Wang

[email protected]

https://orcid.org/0000-0001-8821-7260

School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China

Search for more papers by this author

Published Online:23 Jan 2025https://doi.org/10.1287/moor.2022.0193

References

[1] Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Srebro N, Woodworth B (2023) Lower bounds for non-convex stochastic optimization. Math. Programming 199:165–214.Crossref, Google Scholar
[2] Berahas AS, Curtis FE, Robinson D, Zhou B (2021) Sequential quadratic optimization for nonlinear equality constrained stochastic optimization. SIAM J. Optim. 31(2):1352–1379.Crossref, Google Scholar
[3] Bertsimas D, Gupta V, Kallus N (2018) Robust sample average approximation. Math. Programming 171(1):217–282.Crossref, Google Scholar
[4] Boob D, Deng Q, Lan G (2023) Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Programming 197:215–279.Crossref, Google Scholar
[5] Boob D, Deng Q, Lan G (2024) Level constrained first order methods for function constrained optimization. Math. Programming, 1–61.Google Scholar
[6] Bottou L, Curtis F, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60:223–311.Crossref, Google Scholar
[7] Cartis C, Gould NI, Toint PL (2014) On the complexity of finding first-order critical points in constrained nonlinear optimization. Math. Programming 144(1):93–106.Crossref, Google Scholar
[8] Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):27.Crossref, Google Scholar
[9] Chatterjee N, Chen YH, Maas P, Carroll RJ (2016) Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. J. Amer. Statist. Assoc. 111(513):107–117.Crossref, Google Scholar
[10] Chen C, Tung F, Vedula N, Mori G (2018) Constraint-aware deep neural network compression. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision ECCV 2018, Lecture Notes in Computer Science, vol. 11212 (Springer, Cham, Switzerland), 400–415.Google Scholar
[11] Curtis FE, O’Neill MJ, Robinson DP (2024) Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization. Math. Programming 205:431–483.Crossref, Google Scholar
[12] Curtis FE, Robinson DP, Zhou B (2024) A stochastic inexact sequential quadratic optimization algorithm for nonlinear equality-constrained optimization. INFORMS J. Optimization 6(3–4):173–195.Google Scholar
[13] Curtis FE, Robinson DP, Zhou B (2023) Sequential quadratic optimization for stochastic optimization with deterministic nonlinear inequality and equality constraints. SIAM J. Optimization 34(4):3592–3622.Google Scholar
[14] Cutkosky A, Orabona F (2019) Momentum-based variance reduction in non-convex SGD. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
[15] Defazio A, Bach F, Lacoste-Julien S (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. NeurIPS, vol. 27 (Curran Associates, Red Hook, NY).Google Scholar
[16] Fang C, Li CJ, Lin Z, Zhang T (2018) Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. NeurIPS, vol. 31 (Curran Associates, Red Hook, NY).Google Scholar
[17] Geyer CJ (1991) Constrained maximum likelihood exemplified by isotonic convex logistic regression. J. Amer. Statist. Assoc. 86(415):717–724.Crossref, Google Scholar
[18] Ghadimi S, Lan G (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Crossref, Google Scholar
[19] Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming 155(1):267–305.Crossref, Google Scholar
[20] Jia Z, Grimmer B (2022) First-order methods for nonsmooth nonconvex functional constrained optimization with or without slater points. Preprint, submitted December 2, https://arxiv.org/abs/2212.00927.Google Scholar
[21] Jin L, Wang X (2022) A stochastic primal-dual method for a class of nonconvex constrained optimization. Comput. Optim. Appl. 83:143–180.Crossref, Google Scholar
[22] Jin L, Wang X (2024) Stochastic nested primal-dual method for nonconvex constrained composition optimization. Math. Comput. 94(351):305–358.Google Scholar
[23] Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. NeurIPS, vol. 26 (Curran Associates, Red Hook, NY).Google Scholar
[24] Li Z, Xu Y (2021) Augmented Lagrangian based first-order methods for convex-constrained programs with weakly-convex objective. INFORMS J. Optim. 3(4):373–397.Link, Google Scholar
[25] Li Z, Chen PY, Liu S, Lu S, Xu Y (2021) Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. Proc. 24th Internat. Conf. Artificial Intelligence Statist. (AISTATS) 2021, vol. 130 (PMLR, New York), 2170–2178.Google Scholar
[26] Lin Q, Ma R, Xu Y (2022) Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Comput. Optim. Appl. 82(1):175–224.Crossref, Google Scholar
[27] Ma R, Lin Q, Yang T (2020) Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. Internat. Conf. Machine Learn. (PMLR, New York), 6554–6564.Google Scholar
[28] Márquez-Neila P, Salzmann M, Fua P (2017) Imposing hard constraints on deep networks: Promises and limitations. Preprint, submitted.Google Scholar
[29] Na S, Mahoney MW (2024) Statistical inference of constrained stochastic optimization via sketched sequential quadratic programming. Preprint, submitted April 13, https://arxiv.org/abs/2205.13687.Google Scholar
[30] Na S, Anitescu M, Kolar M (2023) An adaptive stochastic sequential quadratic programming with differentiable exact augmented Lagrangians. Math. Programming 199:721–791.Crossref, Google Scholar
[31] Na S, Anitescu M, Kolar M (2023) Inequality constrained stochastic nonlinear optimization via active-set sequential quadratic programming. Math. Programming 202:279–353.Crossref, Google Scholar
[32] Nandwani Y, Pathak A, Mausam, Singla P (2019) A primal dual formulation for deep learning with constraints. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
[33] Nguyen LM, Liu J, Scheinberg K, Takáč M (2017) Sarah: A novel method for machine learning problems using stochastic recursive gradient. Proc. Machine Learn. Res. 70:2613–2621.Google Scholar
[34] Pham NH, Nguyen LM, Phan DT, Tran-Dinh Q (2020) Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. J. Machine Learn. Res. 21(110):1–48.Google Scholar
[35] Pillo GD, Grippo L (1988) On the exactness of a class of nondifferentiable penalty functions. J. Optim. Theory Appl. 57:399–410.Crossref, Google Scholar
[36] Pillo GD, Grippo L (1989) Exact penalty functions in constrained optimization. SIAM J. Control Optim. 27(6):1333–1360.Crossref, Google Scholar
[37] Ravi SN, Dinh T, Lokhande VS, Singh V (2019) Explicitly imposing constraints in deep networks via conditional gradients gives improved generalization and faster convergence. AAAI’19/IAAI’19/EAAI’19 Proc. Thirty-Third AAAI Conf. Artificial Intelligence Thirty-First Innovative Appl. Artificial Intelligence Conf. Ninth AAAI Sympos. Educational Adv. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 4772–4779.Google Scholar
[38] Rockafellar R (1973) The multiplier method of Hestenes and Powell applied to convex programming. J. Optim. Theory Appl. 12:555–562.Crossref, Google Scholar
[39] Roy SK, Mhammedi Z, Harandi M (2018) Geometry aware constrained optimization techniques for deep learning. CVPR, 4460–4469.Google Scholar
[40] Sahin MF, Eftekhari A, Alacaoglu A, Latorre F, Cevher V (2019) An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
[41] Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math. Programming 83(1):112–162.Google Scholar
[42] Scutari G, Facchinei F, Lampariello L (2017) Parallel and distributed methods for constrained nonconvex optimization—part I: Theory. IEEE Trans. Signal Processing 65(8):1929–1944.Crossref, Google Scholar
[43] Shang F, Jiao L, Zhou K, Cheng J, Ren Y, Jin Y (2018) ASVRG: Accelerated proximal SVRG. Proc. Machine Learn. Res. 95:815–830.Google Scholar
[44] Shapiro A, Dentcheva D, Ruszczynski A (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
[45] Tomar VS, Rose RC (2014) Manifold regularized deep neural networks. Interspeech 2014.Google Scholar
[46] Tran-Dinh Q, Pham NH, Phan DT, Nguyen LM (2022) A hybrid stochastic optimization framework for composite nonconvex optimization. Math. Programming 191:1005–1071.Crossref, Google Scholar
[47] Wang X, Yuan Y (2015) An augmented Lagrangian trust region method for equality constrained optimization. Optim. Methods Software 30(3):559–582.Crossref, Google Scholar
[48] Wang X, Zhang H (2015) An augmented Lagrangian affine scaling method for nonlinear programming. Optim. Methods Software 30(5):934–964.Crossref, Google Scholar
[49] Wang X, Ma S, Yuan Y (2017) Penalty methods with stochastic approximation for stochastic nonlinear programming. Math. Comput. 86(306):1793–1820.Crossref, Google Scholar
[50] Wang Z, Ji K, Zhou Y, Liang Y, Tarokh V (2019) Spiderboost and momentum: Faster variance reduction algorithms. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
[51] Wright S, Nocedal J (2006) Numerical Optimization (Springer, New York).Google Scholar
[52] Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4):2057–2075.Crossref, Google Scholar
[53] Xu Y (2020) Primal-dual stochastic gradient method for convex programs with many functional constraints. SIAM J. Optim. 30(2):1664–1692.Crossref, Google Scholar
[54] Xu Y (2021) First-order methods for constrained convex programming based on linearized augmented Lagrangian function. INFORMS J. Optim. 3(1):89–117.Link, Google Scholar
[55] Xu Y, Xu Y (2023) Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. J. Optim. Theory Appl. 196(1):266–297.Crossref, Google Scholar
[56] Xu M, Ye JJ, Zhang L (2015) Smoothing augmented Lagrangian method for nonsmooth constrained optimization problems. J. Global Optim. 62(4):675–694.Crossref, Google Scholar
[57] Zhu Y, Zabaras N, Koutsourelakis PS, Perdikaris P (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 394:56–81.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 51, Issue 1

February 2026

Pages iv-viii, 1-851

Article Information

Metrics

Information

Received:July 20, 2022
Accepted:November 12, 2024
Published Online:January 23, 2025

Cite as

Qiankun Shi, Xiao Wang, Hao Wang (2025) A Momentum-Based Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Optimization. Mathematics of Operations Research 51(1):92-133.

https://doi.org/10.1287/moor.2022.0193

Keywords

Acknowledgments

The authors thank Dr. Qi Deng for kindly sharing codes of the level constrained proximal gradient.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Momentum-Based Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Optimization

References

Volume 51, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News