A Momentum-Based Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Optimization
Published Online:23 Jan 2025https://doi.org/10.1287/moor.2022.0193
References
- [1] (2023) Lower bounds for non-convex stochastic optimization. Math. Programming 199:165–214.Crossref, Google Scholar
- [2] (2021) Sequential quadratic optimization for nonlinear equality constrained stochastic optimization. SIAM J. Optim. 31(2):1352–1379.Crossref, Google Scholar
- [3] (2018) Robust sample average approximation. Math. Programming 171(1):217–282.Crossref, Google Scholar
- [4] (2023) Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Programming 197:215–279.Crossref, Google Scholar
- [5] (2024) Level constrained first order methods for function constrained optimization. Math. Programming, 1–61.Google Scholar
- [6] (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60:223–311.Crossref, Google Scholar
- [7] (2014) On the complexity of finding first-order critical points in constrained nonlinear optimization. Math. Programming 144(1):93–106.Crossref, Google Scholar
- [8] (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):27.Crossref, Google Scholar
- [9] (2016) Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. J. Amer. Statist. Assoc. 111(513):107–117.Crossref, Google Scholar
- [10] (2018) Constraint-aware deep neural network compression. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision ECCV 2018, Lecture Notes in Computer Science, vol. 11212 (Springer, Cham, Switzerland), 400–415.Google Scholar
- [11] (2024) Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization. Math. Programming 205:431–483.Crossref, Google Scholar
- [12] (2024) A stochastic inexact sequential quadratic optimization algorithm for nonlinear equality-constrained optimization. INFORMS J. Optimization 6(3–4):173–195.Google Scholar
- [13] (2023) Sequential quadratic optimization for stochastic optimization with deterministic nonlinear inequality and equality constraints. SIAM J. Optimization 34(4):3592–3622.Google Scholar
- [14] (2019) Momentum-based variance reduction in non-convex SGD. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
- [15] (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. NeurIPS, vol. 27 (Curran Associates, Red Hook, NY).Google Scholar
- [16] (2018) Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. NeurIPS, vol. 31 (Curran Associates, Red Hook, NY).Google Scholar
- [17] (1991) Constrained maximum likelihood exemplified by isotonic convex logistic regression. J. Amer. Statist. Assoc. 86(415):717–724.Crossref, Google Scholar
- [18] (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.Crossref, Google Scholar
- [19] (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming 155(1):267–305.Crossref, Google Scholar
- [20] (2022) First-order methods for nonsmooth nonconvex functional constrained optimization with or without slater points. Preprint, submitted December 2, https://arxiv.org/abs/2212.00927.Google Scholar
- [21] (2022) A stochastic primal-dual method for a class of nonconvex constrained optimization. Comput. Optim. Appl. 83:143–180.Crossref, Google Scholar
- [22] (2024) Stochastic nested primal-dual method for nonconvex constrained composition optimization. Math. Comput. 94(351):305–358.Google Scholar
- [23] (2013) Accelerating stochastic gradient descent using predictive variance reduction. NeurIPS, vol. 26 (Curran Associates, Red Hook, NY).Google Scholar
- [24] (2021) Augmented Lagrangian based first-order methods for convex-constrained programs with weakly-convex objective. INFORMS J. Optim. 3(4):373–397.Link, Google Scholar
- [25] (2021) Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. Proc. 24th Internat. Conf. Artificial Intelligence Statist. (AISTATS) 2021, vol. 130 (PMLR, New York), 2170–2178.Google Scholar
- [26] (2022) Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Comput. Optim. Appl. 82(1):175–224.Crossref, Google Scholar
- [27] (2020) Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. Internat. Conf. Machine Learn. (PMLR, New York), 6554–6564.Google Scholar
- [28] (2017) Imposing hard constraints on deep networks: Promises and limitations. Preprint, submitted.Google Scholar
- [29] (2024) Statistical inference of constrained stochastic optimization via sketched sequential quadratic programming. Preprint, submitted April 13, https://arxiv.org/abs/2205.13687.Google Scholar
- [30] (2023) An adaptive stochastic sequential quadratic programming with differentiable exact augmented Lagrangians. Math. Programming 199:721–791.Crossref, Google Scholar
- [31] (2023) Inequality constrained stochastic nonlinear optimization via active-set sequential quadratic programming. Math. Programming 202:279–353.Crossref, Google Scholar
- [32] (2019) A primal dual formulation for deep learning with constraints. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
- [33] (2017) Sarah: A novel method for machine learning problems using stochastic recursive gradient. Proc. Machine Learn. Res. 70:2613–2621.Google Scholar
- [34] (2020) Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. J. Machine Learn. Res. 21(110):1–48.Google Scholar
- [35] (1988) On the exactness of a class of nondifferentiable penalty functions. J. Optim. Theory Appl. 57:399–410.Crossref, Google Scholar
- [36] (1989) Exact penalty functions in constrained optimization. SIAM J. Control Optim. 27(6):1333–1360.Crossref, Google Scholar
- [37] (2019) Explicitly imposing constraints in deep networks via conditional gradients gives improved generalization and faster convergence. AAAI’19/IAAI’19/EAAI’19 Proc. Thirty-Third AAAI Conf. Artificial Intelligence Thirty-First Innovative Appl. Artificial Intelligence Conf. Ninth AAAI Sympos. Educational Adv. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 4772–4779.Google Scholar
- [38] (1973) The multiplier method of Hestenes and Powell applied to convex programming. J. Optim. Theory Appl. 12:555–562.Crossref, Google Scholar
- [39] (2018) Geometry aware constrained optimization techniques for deep learning. CVPR, 4460–4469.Google Scholar
- [40] (2019) An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
- [41] (2017) Minimizing finite sums with the stochastic average gradient. Math. Programming 83(1):112–162.Google Scholar
- [42] (2017) Parallel and distributed methods for constrained nonconvex optimization—part I: Theory. IEEE Trans. Signal Processing 65(8):1929–1944.Crossref, Google Scholar
- [43] (2018) ASVRG: Accelerated proximal SVRG. Proc. Machine Learn. Res. 95:815–830.Google Scholar
- [44] (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
- [45] (2014) Manifold regularized deep neural networks. Interspeech 2014.Google Scholar
- [46] (2022) A hybrid stochastic optimization framework for composite nonconvex optimization. Math. Programming 191:1005–1071.Crossref, Google Scholar
- [47] (2015) An augmented Lagrangian trust region method for equality constrained optimization. Optim. Methods Software 30(3):559–582.Crossref, Google Scholar
- [48] (2015) An augmented Lagrangian affine scaling method for nonlinear programming. Optim. Methods Software 30(5):934–964.Crossref, Google Scholar
- [49] (2017) Penalty methods with stochastic approximation for stochastic nonlinear programming. Math. Comput. 86(306):1793–1820.Crossref, Google Scholar
- [50] (2019) Spiderboost and momentum: Faster variance reduction algorithms. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
- [51] (2006) Numerical Optimization (Springer, New York).Google Scholar
- [52] (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4):2057–2075.Crossref, Google Scholar
- [53] (2020) Primal-dual stochastic gradient method for convex programs with many functional constraints. SIAM J. Optim. 30(2):1664–1692.Crossref, Google Scholar
- [54] (2021) First-order methods for constrained convex programming based on linearized augmented Lagrangian function. INFORMS J. Optim. 3(1):89–117.Link, Google Scholar
- [55] (2023) Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. J. Optim. Theory Appl. 196(1):266–297.Crossref, Google Scholar
- [56] (2015) Smoothing augmented Lagrangian method for nonsmooth constrained optimization problems. J. Global Optim. 62(4):675–694.Crossref, Google Scholar
- [57] (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 394:56–81.Crossref, Google Scholar

