Sparse PCA with Multiple Components

Published Online:https://doi.org/10.1287/opre.2023.0598

Sparse principal component analysis (PCA) is a fundamental technique for obtaining interpretable combinations of features, or principal components (PCs), that explain the variance of high-dimensional data sets. This involves solving a sparsity- and orthogonality-constrained convex maximization problem, which is extremely computationally challenging. Most existing work addresses sparse PCA via methods—such as iteratively computing one sparse PC and deflating the covariance matrix—that do not guarantee the orthogonality of the resulting solution when we seek multiple mutually orthogonal PCs. We challenge this status quo by investigating three competitive approaches that each produce valid relaxations and high-quality solutions. Together, these relaxations and their associated heuristics yield solutions with an average bound gap on the order of 3% for real-world data sets with hundreds or thousands of features and r{2,3} components. The first approach reformulates orthogonality conditions as rank constraints and thereby derives tight semidefinite relaxations, strengthened via additional second-order cone inequalities. The second approach uses Lagrangian decompositions to relax the orthogonality constraints via a penalty term in the objective, thus solving the problem as a sequence of r=1 sparse PC problems. The third approach is based on a new combinatorial upper bound on the variance explained for a given support pattern, obtained by solving a mixed-integer linear optimization problem. Numerically, our algorithms match (and sometimes surpass) the best-performing methods in terms of the fraction of variance explained and systematically return PCs that are sparse and orthogonal. In contrast, we find that existing methods, such as deflation, return solutions that violate the orthogonality constraints, even when the data are generated from sparse orthogonal PCs. Altogether, our approaches find near-optimal solutions to sparse PCA problems with multiple components in a practically tractable fashion.

Supplemental Material: All supplemental materials, including the code, data, and files required to reproduce the results, are available at https://doi.org/10.1287/opre.2023.0598.

INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.