A Momentum-Based Linearized Augmented Lagrangian Method for Nonconvex Constrained Stochastic Optimization

Published Online:https://doi.org/10.1287/moor.2022.0193

References

  • [1] Arjevani Y, Carmon Y, Duchi JC, Foster DJ, Srebro N, Woodworth B (2023) Lower bounds for non-convex stochastic optimization. Math. Programming 199:165–214.CrossrefGoogle Scholar
  • [2] Berahas AS, Curtis FE, Robinson D, Zhou B (2021) Sequential quadratic optimization for nonlinear equality constrained stochastic optimization. SIAM J. Optim. 31(2):1352–1379.CrossrefGoogle Scholar
  • [3] Bertsimas D, Gupta V, Kallus N (2018) Robust sample average approximation. Math. Programming 171(1):217–282.CrossrefGoogle Scholar
  • [4] Boob D, Deng Q, Lan G (2023) Stochastic first-order methods for convex and nonconvex functional constrained optimization. Math. Programming 197:215–279.CrossrefGoogle Scholar
  • [5] Boob D, Deng Q, Lan G (2024) Level constrained first order methods for function constrained optimization. Math. Programming, 1–61.Google Scholar
  • [6] Bottou L, Curtis F, Nocedal J (2018) Optimization methods for large-scale machine learning. SIAM Rev. 60:223–311.CrossrefGoogle Scholar
  • [7] Cartis C, Gould NI, Toint PL (2014) On the complexity of finding first-order critical points in constrained nonlinear optimization. Math. Programming 144(1):93–106.CrossrefGoogle Scholar
  • [8] Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Trans. Intelligent Systems Tech. 2(3):27.CrossrefGoogle Scholar
  • [9] Chatterjee N, Chen YH, Maas P, Carroll RJ (2016) Constrained maximum likelihood estimation for model calibration using summary-level information from external big data sources. J. Amer. Statist. Assoc. 111(513):107–117.CrossrefGoogle Scholar
  • [10] Chen C, Tung F, Vedula N, Mori G (2018) Constraint-aware deep neural network compression. Ferrari V, Hebert M, Sminchisescu C, Weiss Y, eds. Computer Vision ECCV 2018, Lecture Notes in Computer Science, vol. 11212 (Springer, Cham, Switzerland), 400–415.Google Scholar
  • [11] Curtis FE, O’Neill MJ, Robinson DP (2024) Worst-case complexity of an SQP method for nonlinear equality constrained stochastic optimization. Math. Programming 205:431–483.CrossrefGoogle Scholar
  • [12] Curtis FE, Robinson DP, Zhou B (2024) A stochastic inexact sequential quadratic optimization algorithm for nonlinear equality-constrained optimization. INFORMS J. Optimization 6(3–4):173–195.Google Scholar
  • [13] Curtis FE, Robinson DP, Zhou B (2023) Sequential quadratic optimization for stochastic optimization with deterministic nonlinear inequality and equality constraints. SIAM J. Optimization 34(4):3592–3622.Google Scholar
  • [14] Cutkosky A, Orabona F (2019) Momentum-based variance reduction in non-convex SGD. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
  • [15] Defazio A, Bach F, Lacoste-Julien S (2014) Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. NeurIPS, vol. 27 (Curran Associates, Red Hook, NY).Google Scholar
  • [16] Fang C, Li CJ, Lin Z, Zhang T (2018) Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. NeurIPS, vol. 31 (Curran Associates, Red Hook, NY).Google Scholar
  • [17] Geyer CJ (1991) Constrained maximum likelihood exemplified by isotonic convex logistic regression. J. Amer. Statist. Assoc. 86(415):717–724.CrossrefGoogle Scholar
  • [18] Ghadimi S, Lan G (2013) Stochastic first- and zeroth-order methods for nonconvex stochastic programming. SIAM J. Optim. 23(4):2341–2368.CrossrefGoogle Scholar
  • [19] Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming 155(1):267–305.CrossrefGoogle Scholar
  • [20] Jia Z, Grimmer B (2022) First-order methods for nonsmooth nonconvex functional constrained optimization with or without slater points. Preprint, submitted December 2, https://arxiv.org/abs/2212.00927.Google Scholar
  • [21] Jin L, Wang X (2022) A stochastic primal-dual method for a class of nonconvex constrained optimization. Comput. Optim. Appl. 83:143–180.CrossrefGoogle Scholar
  • [22] Jin L, Wang X (2024) Stochastic nested primal-dual method for nonconvex constrained composition optimization. Math. Comput. 94(351):305–358.Google Scholar
  • [23] Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. NeurIPS, vol. 26 (Curran Associates, Red Hook, NY).Google Scholar
  • [24] Li Z, Xu Y (2021) Augmented Lagrangian based first-order methods for convex-constrained programs with weakly-convex objective. INFORMS J. Optim. 3(4):373–397.LinkGoogle Scholar
  • [25] Li Z, Chen PY, Liu S, Lu S, Xu Y (2021) Rate-improved inexact augmented Lagrangian method for constrained nonconvex optimization. Proc. 24th Internat. Conf. Artificial Intelligence Statist. (AISTATS) 2021, vol. 130 (PMLR, New York), 2170–2178.Google Scholar
  • [26] Lin Q, Ma R, Xu Y (2022) Complexity of an inexact proximal-point penalty method for constrained smooth non-convex optimization. Comput. Optim. Appl. 82(1):175–224.CrossrefGoogle Scholar
  • [27] Ma R, Lin Q, Yang T (2020) Quadratically regularized subgradient methods for weakly convex optimization with weakly convex constraints. Internat. Conf. Machine Learn. (PMLR, New York), 6554–6564.Google Scholar
  • [28] Márquez-Neila P, Salzmann M, Fua P (2017) Imposing hard constraints on deep networks: Promises and limitations. Preprint, submitted.Google Scholar
  • [29] Na S, Mahoney MW (2024) Statistical inference of constrained stochastic optimization via sketched sequential quadratic programming. Preprint, submitted April 13, https://arxiv.org/abs/2205.13687.Google Scholar
  • [30] Na S, Anitescu M, Kolar M (2023) An adaptive stochastic sequential quadratic programming with differentiable exact augmented Lagrangians. Math. Programming 199:721–791.CrossrefGoogle Scholar
  • [31] Na S, Anitescu M, Kolar M (2023) Inequality constrained stochastic nonlinear optimization via active-set sequential quadratic programming. Math. Programming 202:279–353.CrossrefGoogle Scholar
  • [32] Nandwani Y, Pathak A, Mausam, Singla P (2019) A primal dual formulation for deep learning with constraints. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
  • [33] Nguyen LM, Liu J, Scheinberg K, Takáč M (2017) Sarah: A novel method for machine learning problems using stochastic recursive gradient. Proc. Machine Learn. Res. 70:2613–2621.Google Scholar
  • [34] Pham NH, Nguyen LM, Phan DT, Tran-Dinh Q (2020) Proxsarah: An efficient algorithmic framework for stochastic composite nonconvex optimization. J. Machine Learn. Res. 21(110):1–48.Google Scholar
  • [35] Pillo GD, Grippo L (1988) On the exactness of a class of nondifferentiable penalty functions. J. Optim. Theory Appl. 57:399–410.CrossrefGoogle Scholar
  • [36] Pillo GD, Grippo L (1989) Exact penalty functions in constrained optimization. SIAM J. Control Optim. 27(6):1333–1360.CrossrefGoogle Scholar
  • [37] Ravi SN, Dinh T, Lokhande VS, Singh V (2019) Explicitly imposing constraints in deep networks via conditional gradients gives improved generalization and faster convergence. AAAI’19/IAAI’19/EAAI’19 Proc. Thirty-Third AAAI Conf. Artificial Intelligence Thirty-First Innovative Appl. Artificial Intelligence Conf. Ninth AAAI Sympos. Educational Adv. Artificial Intelligence, vol. 33 (AAAI Press, Palo Alto, CA), 4772–4779.Google Scholar
  • [38] Rockafellar R (1973) The multiplier method of Hestenes and Powell applied to convex programming. J. Optim. Theory Appl. 12:555–562.CrossrefGoogle Scholar
  • [39] Roy SK, Mhammedi Z, Harandi M (2018) Geometry aware constrained optimization techniques for deep learning. CVPR, 4460–4469.Google Scholar
  • [40] Sahin MF, Eftekhari A, Alacaoglu A, Latorre F, Cevher V (2019) An inexact augmented Lagrangian framework for nonconvex optimization with nonlinear constraints. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
  • [41] Schmidt M, Le Roux N, Bach F (2017) Minimizing finite sums with the stochastic average gradient. Math. Programming 83(1):112–162.Google Scholar
  • [42] Scutari G, Facchinei F, Lampariello L (2017) Parallel and distributed methods for constrained nonconvex optimization—part I: Theory. IEEE Trans. Signal Processing 65(8):1929–1944.CrossrefGoogle Scholar
  • [43] Shang F, Jiao L, Zhou K, Cheng J, Ren Y, Jin Y (2018) ASVRG: Accelerated proximal SVRG. Proc. Machine Learn. Res. 95:815–830.Google Scholar
  • [44] Shapiro A, Dentcheva D, Ruszczynski A (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [45] Tomar VS, Rose RC (2014) Manifold regularized deep neural networks. Interspeech 2014.Google Scholar
  • [46] Tran-Dinh Q, Pham NH, Phan DT, Nguyen LM (2022) A hybrid stochastic optimization framework for composite nonconvex optimization. Math. Programming 191:1005–1071.CrossrefGoogle Scholar
  • [47] Wang X, Yuan Y (2015) An augmented Lagrangian trust region method for equality constrained optimization. Optim. Methods Software 30(3):559–582.CrossrefGoogle Scholar
  • [48] Wang X, Zhang H (2015) An augmented Lagrangian affine scaling method for nonlinear programming. Optim. Methods Software 30(5):934–964.CrossrefGoogle Scholar
  • [49] Wang X, Ma S, Yuan Y (2017) Penalty methods with stochastic approximation for stochastic nonlinear programming. Math. Comput. 86(306):1793–1820.CrossrefGoogle Scholar
  • [50] Wang Z, Ji K, Zhou Y, Liang Y, Tarokh V (2019) Spiderboost and momentum: Faster variance reduction algorithms. NeurIPS, vol. 32 (Curran Associates, Red Hook, NY).Google Scholar
  • [51] Wright S, Nocedal J (2006) Numerical Optimization (Springer, New York).Google Scholar
  • [52] Xiao L, Zhang T (2014) A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24(4):2057–2075.CrossrefGoogle Scholar
  • [53] Xu Y (2020) Primal-dual stochastic gradient method for convex programs with many functional constraints. SIAM J. Optim. 30(2):1664–1692.CrossrefGoogle Scholar
  • [54] Xu Y (2021) First-order methods for constrained convex programming based on linearized augmented Lagrangian function. INFORMS J. Optim. 3(1):89–117.LinkGoogle Scholar
  • [55] Xu Y, Xu Y (2023) Momentum-based variance-reduced proximal stochastic gradient method for composite nonconvex stochastic optimization. J. Optim. Theory Appl. 196(1):266–297.CrossrefGoogle Scholar
  • [56] Xu M, Ye JJ, Zhang L (2015) Smoothing augmented Lagrangian method for nonsmooth constrained optimization problems. J. Global Optim. 62(4):675–694.CrossrefGoogle Scholar
  • [57] Zhu Y, Zabaras N, Koutsourelakis PS, Perdikaris P (2019) Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data. J. Comput. Phys. 394:56–81.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.