An Alternating Manifold Proximal Gradient Method for Sparse Principal Component Analysis and Sparse Canonical Correlation Analysis

Published Online:https://doi.org/10.1287/ijoo.2019.0032

References

  • Absil PA, Mahony R, Sepulchre R (2009) Optimization Algorithms on Matrix Manifolds (Princeton University Press, Princeton, NJ).Google Scholar
  • Attouch H, Bolte J, Redont P, Soubeyran A (2010) Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 35(2):438–457.LinkGoogle Scholar
  • Baik J, Silverstein JW (2006) Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97(6):1382–1408.Google Scholar
  • Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1):183–202.Google Scholar
  • Bendory T, Eldar YC, Boumal N (2018) Non-convex phase retrieval from STFT measurements. IEEE Trans. Inform. Theory 64(1):467–484.Google Scholar
  • Bolte J, Sabach S, Teboulle M (2014) Proximal alternating linearized minimization or nonconvex and nonsmooth problems. Math. Programming 146(1–2):459–494.Google Scholar
  • Boumal N (2016) Nonconvex phase synchronization. SIAM J. Optim. 26(4):2355–2377.Google Scholar
  • Boumal N, Absil PA (2011) RTRMC: A Riemannian trust-region method for low-rank matrix completion. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Proc. 24th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 406–414.Google Scholar
  • Chen M, Gao C, Ren Z, Zhou HH (2013) Sparse CCA via precision adjusted iterative thresholding. Preprint, submitted November 24, https://arxiv.org/abs/1311.6186.Google Scholar
  • Chen S, Ma S, So AMC, Zhang T (2020) Proximal gradient method for nonsmooth optimization over the Stiefel manifold. SIAM J. Optim. 30(1):210–239.Google Scholar
  • Cherian A, Sra S (2017) Riemannian dictionary learning and sparse coding for positive definite matrices. IEEE Trans. Neural Networks Learn. Systems 28(12):2859–2871.Google Scholar
  • d’Aspremont A (2011) Identifying small mean-reverting portfolios. Quant. Finance 11(3):351–364.Google Scholar
  • d’Aspremont A, Bach F, Ghaoui LE (2008) Optimal solutions for sparse principal component analysis. J. Machine Learn. Res. 9:1269–1294.Google Scholar
  • d’Aspremont A, Ghaoui LE, Jordan MI, Lanckriet GRG (2007) A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49(3):434–448.Google Scholar
  • Erichson NB, Zheng P, Manoharz K, Brunton SL, Kutz JN, Aravkin AY (2020) Sparse principal component analysis via variable projection. SIAM J. Appl. Math. Forthcoming.Google Scholar
  • Facchinei F, Pang J (2007) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer Science & Business Media, Berlin).Google Scholar
  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J. Statist. Software 33(1):1–22.Google Scholar
  • Gao C, Ma Z, Zhou HH (2017) Sparse CCA: Adaptive estimation and computational barriers. Ann. Statist. 45(5):2074–2101.Google Scholar
  • Hardoon DR, Shawe-Taylor J (2011) Sparse canonical correlation analysis. Machine Learn. 83(3):331–353.Google Scholar
  • Hiriart-Urruty JB, Strodiot JJ, Nguyen VH (1984) Generalized Hessian matrix and second-order optimality conditions for problems with C1,1 data. Appl. Math. Optim. 11(1):43–56.Google Scholar
  • Hotelling H (1936) Relations between two sets of variates. Biometrika 28(3–4):321–377.Google Scholar
  • Huang W, Hand P (2018) Blind deconvolution by a steepest descent algorithm on a quotient manifold. SIAM J. Imaging Sci. 11(4):2757–2785.Google Scholar
  • Jolliffe I, Trendafilov N, Uddin M (2003) A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12(3):531–547.Google Scholar
  • Journee M, Nesterov Y, Richtarik P, Sepulchre R (2010) Generalized power method for sparse principal component analysis. J. Machine Learn. Res. 11(February):517–553.Google Scholar
  • Li X, Sun D, Toh KC (2018) A highly efficient semismooth Newton augmented Lagrangian method for solving lasso problems. SIAM J. Optim. 28(1):433–458.Google Scholar
  • Liu H, Yue MC, So AMC (2017) On the estimation performance and convergence rate of the generalized power method for phase synchronization. SIAM J. Optim. 27(4):2426–2446.Google Scholar
  • Lu Z, Zhang Y (2012) An augmented Lagrangian approach for sparse principal component analysis. Math. Programming 135:149–193.Google Scholar
  • Ma S (2013) Alternating direction method of multipliers for sparse principal component analysis. J. Oper. Res. Soc. China 1(2):253–274.Google Scholar
  • Magnus JR, Neudecker H (2018) Matrix Differential Calculus with Applications in Statistics and Econometrics (Wiley, Hoboken, NJ).Google Scholar
  • Mifflin R (1977) Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 15(6):959–972.Google Scholar
  • Moghaddam B, Weiss Y, Avidan S (2005) Spectral bounds for sparse PCA: Exact and greedy algorithms. Weiss Y, Scholkopf B, Platt JC, eds. Proc. 18th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 915–922.Google Scholar
  • Nadler B (2008) Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36(6):2791–2817.Google Scholar
  • Parkhomenko E, Tritchler D, Beyene J (2009) Sparse canonical correlation analysis with application to genomic data integration. Statist. Appl. Genetics Molecular Biol. 8(1):1–34.Google Scholar
  • Paul D (2007) Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statistica Sinica 17:1617–1642.Google Scholar
  • Pearson K (1901) LIII. On lines and planes of closest fit to systems of points in space. London Edinburgh Dublin Philos. Magazine J. Sci. 2(11):559–572.Google Scholar
  • Qi H, Sun D (2011) An augmented Lagrangian dual approach for the H-weighted nearest correlation matrix problem. IMA J. Numer. Anal. 31(2):491–511.Google Scholar
  • Qi L, Sun J (1993) A nonsmooth version of Newton’s method. Math. Programming 58:353–367.Google Scholar
  • Shen H, Huang JZ (2008) Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99(6):1015–1034.Google Scholar
  • Solodov MV, Svaiter BF (1998) A globally convergent inexact Newton method for systems of monotone equations. Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smoothing Methods (Springer, Dordrecht), 355–369.Google Scholar
  • Sun J, Qu Q, Wright J (2017) Complete dictionary recovery over the sphere I: Overview and the geometric picture. IEEE Trans. Inform. Theory 63(2):853–884.Google Scholar
  • Sun J, Qu Q, Wright J (2018) A geometrical analysis of phase retrieval. Foundations Comput. Math. 18(5):1131–1198.Google Scholar
  • Suo X, Minden V, Nelson B, Tibshirani R, Saunders M (2017) Sparse canonical correlation analysis. Preprint, submitted May 30, https://arxiv.org/abs/1705.10865.Google Scholar
  • Ulbrich M (2011) Semismooth Newton Methods for Variational Inequalities and Constrained Optimization Problems in Function Spaces, vol. 11 (SIAM, Philadelphia).Google Scholar
  • Vandereycken B (2013) Low-rank matrix completion by Riemannian optimization. SIAM J. Optim. 23(2):1214–1236.Google Scholar
  • Vu VQ, Cho J, Lei J, Rohe K (2013) Fantope projection and selection: A near-optimal convex relaxation of sparse PCA. Burges CJC, Bouttou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Proc. 26th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2670–2678.Google Scholar
  • Wang C, Sun D, Toh KC (2010) Solving log-determinant optimization problems by a Newton-CG primal proximal point algorithm. SIAM J. Optim. 20(6):2994–3013.Google Scholar
  • Wiesel A, Kliger M, Hero AO III (2008) A greedy approach to sparse canonical correlation analysis. Preprint, submitted January 17, arXiv:0801.2748.Google Scholar
  • Witten DM, Tibshirani R, Hastie T (2009) A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatist. 10(3):515–534.Google Scholar
  • Xiao X, Li Y, Wen Z, Zhang L (2018) A regularized semi-smooth Newton method with projection steps for composite convex programs. J. Sci. Comput. 76(1):364–389.Google Scholar
  • Yang J, Sun D, Toh KC (2013) A proximal point algorithm for log-determinant optimization with group lasso regularization. SIAM J. Optim. 23(2):857–893.Google Scholar
  • Yang L, Sun D, Toh KC (2015) SDPNAL+: A majorized semismooth Newton-CG augmented Lagrangian method for semidefinite programming with nonnegative constraints. Math. Programming Comput. 7(3):331–366.Google Scholar
  • Yuan XT, Zhang T (2013) Truncated power method for sparse eigenvalue problems. J. Machine Learn. Res. 14(1):899–925.Google Scholar
  • Zhao X, Sun D, Toh KC (2010) A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4):1737–1765.Google Scholar
  • Zhou G, Toh KC (2005) Superlinear convergence of a Newton-type algorithm for monotone equations. J. Optim. Theory Appl. 125(1):205–221.Google Scholar
  • Zou H, Xue L (2018) A selective overview of sparse principal component analysis. Proc. IEEE 106(8):1311–1320.Google Scholar
  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. B 67(2):301–320.Google Scholar
  • Zou H, Hastie T, Tibshirani R (2006) Sparse principal component analysis. J. Comput. Graphical Statist. 15(2):265–286.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.