Dissolving Constraints for Riemannian Optimization

Nachuan Xiao
Nachuan Xiao
[email protected]
https://orcid.org/0000-0002-2725-2633
Institute of Operational Research and Analytics, National University of Singapore, Singapore 119076;
Search for more papers by this author
,
Xin Liu
Xin Liu
[email protected]
https://orcid.org/0000-0002-5705-0805
State Key Laboratory of Scientific and Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 101408, China;
Search for more papers by this author
,
Kim-Chuan Toh
Corresponding Author
Kim-Chuan Toh
[email protected]
https://orcid.org/0000-0001-7204-8933
Institute of Operational Research and Analytics, National University of Singapore, Singapore 119076;Department of Mathematics, National University of Singapore, Singapore 119076
Search for more papers by this author

Institute of Operational Research and Analytics, National University of Singapore, Singapore 119076;

State Key Laboratory of Scientific and Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 101408, China;

Search for more papers by this author

Kim-Chuan Toh

Corresponding Author

Kim-Chuan Toh

[email protected]

https://orcid.org/0000-0001-7204-8933

Institute of Operational Research and Analytics, National University of Singapore, Singapore 119076;Department of Mathematics, National University of Singapore, Singapore 119076

Search for more papers by this author

Published Online:8 Mar 2023https://doi.org/10.1287/moor.2023.1360

References

[1] Abrudan T, Eriksson J, Koivunen V (2009) Conjugate gradient algorithm for optimization under unitary matrix constraint. Signal Processing 89(9):1704–1714.Crossref, Google Scholar
[2] Abrudan TE, Eriksson J, Koivunen V (2008) Steepest descent algorithms for optimization under unitary matrix constraint. IEEE Trans. Signal Processing 56(3):1134–1147.Crossref, Google Scholar
[3] Absil PA, Baker CG, Gallivan KA (2007) Trust-region methods on Riemannian manifolds. Foundations Comput. Math. 7(3):303–330.Crossref, Google Scholar
[4] Absil PA, Mahony R, Sepulchre R (2009) Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, NJ).Google Scholar
[5] Bai Z, Li RC (2014) Minimization principles and computation for the generalized linear response eigenvalue problem. BIT Numerical Math. 54(1):31–54.Crossref, Google Scholar
[6] Bécigneul G, Ganea OE (2018) Riemannian adaptive optimization methods. Preprint, submitted October 1, https://arxiv.org/abs/1810.00760.Google Scholar
[7] Bolte J, Sabach S, Teboulle M (2014) Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Programming 146(1):459–494.Crossref, Google Scholar
[8] Boumal N (2020) An introduction to optimization on smooth manifolds. Accessed February 24, 2023, https://www.nicolasboumal.net/book/.Google Scholar
[9] Boumal N, Absil PA (2015) Low-rank matrix completion via preconditioned optimization on the grassmann manifold. Linear Algebra Appl. 475:200–239.Crossref, Google Scholar
[10] Boumal N, Mishra B, Absil PA, Sepulchre R (2014) Manopt, a matlab toolbox for optimization on manifolds. J. Machine Learn. Res. 15(1):1455–1459.Google Scholar
[11] Bradbury J, Frostig R, Hawkins P, Johnson MJ, Leary C, Maclaurin D, Necula G, et al. (2018) JAX: Composable transformations of Python+NumPy programs. Accessed, http://github.com/google/jax.Google Scholar
[12] Byrd RH, Lu P, Nocedal J, Zhu C (1995) A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 16(5):1190–1208.Crossref, Google Scholar
[13] Cartis C, Gould NI, Toint PL (2012) An adaptive cubic regularization algorithm for nonconvex optimization with convex constraints and its function-evaluation complexity. IMA J. Numerical Anal. 32(4):1662–1695.Crossref, Google Scholar
[14] Cartis C, Gould NI, Toint PL (2012) Complexity bounds for second-order optimality in unconstrained optimization. J. Complexity 28(1):93–108.Crossref, Google Scholar
[15] Cartis C, Gould NI, Toint PL (2018) Worst-case evaluation complexity and optimality of second-order methods for nonconvex smooth optimization. Proc. Internat. Congress Mathematicians (World Scientific), 3711–3750.Google Scholar
[16] Conn AR, Gould NI, Toint PL (2000) Trust Region Methods (SIAM, Philadelphia).Crossref, Google Scholar
[17] Criscitiello C, Boumal N (2019) Efficiently escaping saddle points on manifolds. Preprint, submitted October 23, https://arxiv.org/abs/1906.04321.Google Scholar
[18] Criscitiello C, Boumal N (2022) An accelerated first-order method for non-convex optimization on manifolds. Foundations Comput. Math., https://doi.org/10.1007/s10208-022-09573-9.Crossref, Google Scholar
[19] Di Pillo G, Grippo L (1986) An exact penalty function method with global convergence properties for nonlinear programming problems. Math. Programming 36(1):1–18.Crossref, Google Scholar
[20] Edelman A, Arias TA, Smith ST (1998) The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2):303–353.Crossref, Google Scholar
[21] Estrin R, Friedlander MP, Orban D, Saunders MA (2020) Implementing a smooth exact penalty function for equality-constrained nonlinear optimization. SIAM J. Sci. Comput. 42(3):A1809–A1835.Crossref, Google Scholar
[22] Fletcher R (1970) A class of methods for nonlinear programming with termination and convergence properties. Abadie J, ed. Integer and Nonlinear Programming (North-Holland, Amsterdam), 157–173.Google Scholar
[23] Fletcher R, Leyffer S (2002) Nonlinear programming without a penalty function. Math. Programming 91(2):239–269.Crossref, Google Scholar
[24] Gao B, Liu X, Yuan Y-x (2019) Parallelizable algorithms for optimization problems with orthogonality constraints. SIAM J. Sci. Comput. 41(3):A1949–A1983.Crossref, Google Scholar
[25] Gao B, Son NT, Absil PA, Stykel T (2021) Riemannian optimization on the symplectic Stiefel manifold. SIAM J. Optim. 31(2):1546–1575.Crossref, Google Scholar
[26] Ge R, Huang F, Jin C, Yuan Y (2015) Escaping from saddle points–online stochastic gradient for tensor decomposition. Conf. Learn. Theory (PMLR), 797–842.Google Scholar
[27] Golub GH, Van Loan CF (2013) Matrix Computations (JHU Press, Baltimore).Crossref, Google Scholar
[28] Gould NI, Lucidi S, Roma M, Toint PL (1999) Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 9(2):504–525.Crossref, Google Scholar
[29] Grippo L, Lampariello F, Lucidi S (1986) A nonmonotone line search technique for Newton’s method. SIAM J. Numerical Anal. 23(4):707–716.Crossref, Google Scholar
[30] Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, Wieser E, et al. (2020) Array programming with NumPy. Nature 585(7825):357–362.Crossref, Google Scholar
[31] Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bureau Standards 49(6):409–436.Crossref, Google Scholar
[32] Hosseini S (2015) Convergence of nonsmooth descent methods via Kurdyka–Lojasiewicz inequality on Riemannian manifolds. 2015 INS Technical report No. 1523, Hausdorff Center for Mathematics and Institute for Numerical Simulation, University of Bonn, Bonn, Germany.Google Scholar
[33] Hu J, Liu X, Wen ZW, Yuan YX (2020) A brief introduction to manifold optimization. J. Oper. Res. Soc. China 8(2):199–248.Crossref, Google Scholar
[34] Hu X, Liu X (2020) An efficient orthonormalization-free approach for sparse dictionary learning and dual principal component pursuit. Sensors (Basel) 20(11):3041.Crossref, Google Scholar
[35] Jin C, Netrapalli P, Jordan MI (2018) Accelerated gradient descent escapes saddle points faster than gradient descent. Conf. Learn. Theory (PMLR), 1042–1085.Google Scholar
[36] Jin C, Ge R, Netrapalli P, Kakade SM, Jordan MI (2017) How to escape saddle points efficiently. Internat. Conf. Machine Learn. (PMLR), 1724–1732.Google Scholar
[37] Kochurov M, Karimov R, Kozlukov S (2020) Geoopt: Riemannian optimization in pytorch. Preprint, submitted July 17, https://arxiv.org/abs/2005.02819.Google Scholar
[38] Lee JD, Panageas I, Piliouras G, Simchowitz M, Jordan MI, Recht B (2019) First-order methods almost always avoid strict saddle points. Math. Programming 176(1):311–337.Crossref, Google Scholar
[39] Lei Y, Hu T, Li G, Tang K (2019) Stochastic gradient descent for nonconvex learning without bounded gradient assumptions. IEEE Trans. Neural Networks Learn. Systems 31(10):4394–4400.Crossref, Google Scholar
[40] Lenders F, Kirches C, Potschka A (2018) trlib: A vector-free implementation of the GLTR method for iterative solution of the trust region problem. Optim. Methods Software 33(3):420–449.Crossref, Google Scholar
[41] Li H, Lin Z (2015) Accelerated proximal gradient methods for nonconvex programming. Adv. Neural Inform. Processing Systems 28 (NIPS 2015), 379–387.Google Scholar
[42] Li L, Toh KC (2010) An inexact interior point method for l 1-regularized sparse covariance selection. Math. Programming Comput. 2(3):291–315.Crossref, Google Scholar
[43] Łojasiewicz S (1961) Sur le probleme de la division (Instytut Matematyczny Polskiej Akademi Nauk, Warsaw).Google Scholar
[44] Lojasiewicz S (1963) Une propriété topologique des sous-ensembles analytiques réels, les équations aux dérivées partielles. Les Éditions aux Dérivées Partielles, vol. 117, 87–89.Google Scholar
[45] Maclaurin D, Duvenaud D, Adams RP (2015) Autograd: Effortless gradients in numpy. ICML 2015 AutoML Workshop, vol. 238.Google Scholar
[46] Meghwanshi M, Jawanpuria P, Kunchukuttan A, Kasai H, Mishra B (2018) McTorch, a manifold optimization library for deep learning. Preprint, submitted October 4, https://arxiv.org/abs/1810.01811.Google Scholar
[47] Morales JL, Nocedal J (2011) Remark on “algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound constrained optimization.” ACM Trans. Math. Software 38(1):7.Crossref, Google Scholar
[48] Nash SG (1984) Newton-type minimization via the Lanczos method. SIAM J. Numerical Anal. 21(4):770–788.Crossref, Google Scholar
[49] Nesterov Y, Polyak BT (2006) Cubic regularization of Newton method and its global performance. Math. Programming 108(1):177–205.Crossref, Google Scholar
[50] Nickel M, Kiela D (2018) Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. Internat. Conf. Machine Learn. (PMLR), 3779–3788.Google Scholar
[51] Nocedal J, Wright S (2006) Numerical Optimization (Springer Science & Business Media, New York).Google Scholar
[52] Ochs P, Chen Y, Brox T, Pock T (2014) ipiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 7(2):1388–1419.Crossref, Google Scholar
[53] Powell MJ (1969) A method for nonlinear constraints in minimization problems. Fletcher R, ed. Optimization (Academic Press, New York), 283–298.Google Scholar
[54] Qi C, Gallivan KA, Absil PA (2010) Riemannian BFGS algorithm with applications. Diehl M, Glineur F, Jarlebring E, Michiels W, eds. Recent Advances in Optimization and Its Applications in Engineering (Springer, Berlin), 183–192.Crossref, Google Scholar
[55] Sato H (2016) A Dai–Yuan-type Riemannian conjugate gradient method with the weak Wolfe conditions. Comput. Optim. Appl. 64(1):101–118.Crossref, Google Scholar
[56] Scott DS (1981) Solving sparse symmetric generalized eigenvalue problems without factorization. SIAM J. Numerical Anal. 18(1):102–110.Crossref, Google Scholar
[57] Siegel JW (2019) Accelerated optimization with orthogonality constraints. Preprint, submitted October 20, https://arxiv.org/abs/1903.05204v3.Google Scholar
[58] Son NT, Absil PA, Gao B, Stykel T (2021) Computing symplectic eigenpairs of symmetric positive-definite matrices via trace minimization and Riemannian optimization. SIAM J. Matrix Anal. Appl. 42(4):1732–1757.Crossref, Google Scholar
[59] Steihaug T (1983) The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numerical Anal. 20(3):626–637.Crossref, Google Scholar
[60] Toint P (1981) Toward an efficient sparsity exploiting Newton method for minimization. Duff IS, ed. Sparse Matrices and Their Uses (Academic Press, London), 57–88.Google Scholar
[61] Townsend J, Koep N, Weichwald S (2016) Pymanopt: A python toolbox for optimization on manifolds using automatic differentiation. Preprint, submitted September 8, https://arxiv.org/abs/1603.03236.Google Scholar
[62] Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, et al. (2020) Scipy 1.0: Fundamental algorithms for scientific computing in python. Nature Methods 17(3):261–272.Crossref, Google Scholar
[63] Wang L, Gao B, Liu X (2021) Multipliers correction methods for optimization problems over the Stiefel manifold. CSIAM Trans. Appl. Math. 2(2021):508–531.Crossref, Google Scholar
[64] Wang Z, Zhou Y, Liang Y, Lan G (2020) Cubic regularization with momentum for nonconvex optimization. Proc. 35th Uncertainty Artificial Intelligence Conf. (PMLR), 313–322.Google Scholar
[65] Wen Z, Yin W (2013) A feasible method for optimization with orthogonality constraints. Math. Programming 142(1–2):397–434.Crossref, Google Scholar
[66] Wu RB, Chakrabarti R, Rabitz H (2010) Critical landscape topology for optimization on the symplectic group. J. Optim. Theory Appl. 145(2):387–406.Crossref, Google Scholar
[67] Xiao N, Liu X (2021) Solving optimization problems over the Stiefel manifold by smooth exact penalty function. Preprint, submitted November 23, https://arxiv.org/abs/2110.08986v2.Google Scholar
[68] Xiao N, Liu X, Yuan Y-x (2020) A class of smooth exact penalty function methods for optimization problems with orthogonality constraints. Optim. Methods Software 37(4):1205–1241.Crossref, Google Scholar
[69] Xiao N, Liu X, Yuan Y-x (2021) A penalty-free infeasible approach for a class of nonsmooth opimtization problems over the Stiefel manifold. Preprint, submitted March 28, https://arxiv.org/abs/2103.03514.Google Scholar
[70] Xiao N, Liu X, Yuan Y-x (2021) Exact penalty function for ℓ2,1 norm minimization over the Stiefel manifold. SIAM J. Optim. 31(4):3097–3126.Crossref, Google Scholar
[71] Yuan Yx (2015) Recent advances in trust region algorithms. Math. Programming 151(1):249–281.Crossref, Google Scholar
[72] Zavala VM, Anitescu M (2014) Scalable nonlinear programming via exact differentiable penalty functions and trust-region Newton methods. SIAM J. Optim. 24(1):528–558.Crossref, Google Scholar
[73] Zhang E, Noakes L (2020) Riemannian cubics in quadratic matrix Lie groups. Appl. Math. Comput. 375:125082.Google Scholar
[74] Zhang H, Sra S (2018) Toward Riemannian accelerated gradient methods. Preprint, submitted June 7, https://arxiv.org/abs/1806.02812.Google Scholar
[75] Zhang J, Zhang H, Sra S (2018) R-SPIDER: A fast Riemannian stochastic optimization algorithm with curvature independent rate. Preprint, submitted December 14, https://arxiv.org/abs/1811.04194.Google Scholar
[76] Zhu C, Byrd RH, Lu P, Nocedal J (1997) Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Software 23(4):550–560.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 49, Issue 1

February 2024

Pages 1-651, C2

Article Information

Metrics

Information

Received:March 21, 2022
Accepted:February 03, 2023
Published Online:March 08, 2023

Cite as

Nachuan Xiao, Xin Liu, Kim-Chuan Toh (2023) Dissolving Constraints for Riemannian Optimization. Mathematics of Operations Research 49(1):366-397.

https://doi.org/10.1287/moor.2023.1360

Keywords

Acknowledgments.

The authors thank the reviewers for their helpful suggestions to improve the paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Dissolving Constraints for Riemannian Optimization

References

Volume 49, Issue 1

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News