Can Learning Be Explained by Local Optimality in Robust Low-Rank Matrix Recovery?
Published Online:23 Jul 2025https://doi.org/10.1287/moor.2023.0371
References
- [1] (2016) Global optimality of local search for low rank matrix recovery. Adv. Neural Inform. Processing Systems 29.Google Scholar
- [2] (2024) Stochastic subgradient descent escapes active strict saddles on weakly convex functions. Math. Oper. Res. 49(3):1761–1790.Link, Google Scholar
- [3] (2014) Robust PCA via principal component pursuit: A review for a comparative evaluation in video surveillance. Comput. Vision Image Understanding 122:22–34.Crossref, Google Scholar
- [4] (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Programming 95(2):329–357.Crossref, Google Scholar
- [5] (2005) Local minima and convergence in low-rank semidefinite programming. Math. Programming 103(3):427–444.Crossref, Google Scholar
- [6] (2011) Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory 57(4):2342–2359.Crossref, Google Scholar
- [7] (2011) Robust principal component analysis? J. ACM 58(3):11.Crossref, Google Scholar
- [8] (2011) Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2):572–596.Crossref, Google Scholar
- [9] (2021) Low-rank matrix recovery with composite optimization: Good conditioning and rapid convergence. Foundations Comput. Math. 21(6):1505–1593.Crossref, Google Scholar
- [10] (2015) Incoherence-optimal matrix completion. IEEE Trans. Inform. Theory 61(5):2909–2923.Crossref, Google Scholar
- [11] (2018) Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization. IEEE Signal Processing Magazine 35(4):14–31.Crossref, Google Scholar
- [12] (2019) Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Trans. Signal Processing 67(20):5239–5269.Crossref, Google Scholar
- [13] (2006) Concentration inequalities and martingale inequalities: A survey. Internet Math. 3(1):79–127.Crossref, Google Scholar
- [14] (1975) Generalized gradients and applications. Trans. Amer. Math. Soc. 205:247–262.Crossref, Google Scholar
- [15] (1990) Optimization and Nonsmooth Analysis (SIAM, Philadelphia).Crossref, Google Scholar
- [16] (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Adv. Neural Inform. Processing Systems 27.Google Scholar
- [17] (2022) Proximal methods avoid active strict saddles of weakly convex functions. Foundations Comput. Math. 22(2):561–606.Crossref, Google Scholar
- [18] (2022) Escaping strict saddle points of the Moreau envelope in nonsmooth optimization. SIAM J. Optim. 32(3):1958–1983.Crossref, Google Scholar
- [19] (2021) Subgradient methods near active manifolds: Saddle point avoidance, local convergence, and asymptotic normality. Preprint, submitted August 26, https://arxiv.org/abs/2108.11832v1.Google Scholar
- [20] (2021) Rank overspecified robust matrix recovery: Subgradient method and exact recovery. Adv. Neural Inform. Processing Systems 34:26767–26778.Google Scholar
- [21] (2019) Sharp analysis for nonconvex SGD escaping from saddle points. Conf. Learn. Theory (PMLR, Cambridge, MA), 1192–1234.Google Scholar
- [22] (2020) Exact guarantees on the absence of spurious local minima for non-negative rank-1 robust principal component analysis. J. Machine Learn. Res. 21:1–51.Google Scholar
- [23] (2017) No spurious local minima in nonconvex low rank problems: A unified geometric analysis. Internat. Conf. Machine Learn. (PMLR, Cambridge, MA), 1233–1242.Google Scholar
- [24] (2016) Matrix completion has no spurious local minimum. Adv. Neural Inform. Processing Systems 29.Google Scholar
- [25] (2010) Differential Topology, vol. 370 (American Mathematical Society, Providence, RI).Google Scholar
- [26] (2017) How to escape saddle points efficiently. Internat. Conf. Machine Learn. (PMLR, Cambridge, MA), 1724–1732.Google Scholar
- [27] (2022) Nonsmooth rank-one matrix factorization landscape. Optim. Lett. 16(6):1611–1631.Crossref, Google Scholar
- [28] (1991) Probability in Banach Spaces: Isoperimetry and Processes, vol. 23 (Springer Science & Business Media, Heidelberg).Crossref, Google Scholar
- [29] (2016) Gradient descent only converges to minimizers. Conf. Learn. Theory (PMLR, Cambridge, MA), 1246–1257.Google Scholar
- [30] (2024) Teaching arithmetic to small transformers. Internat. Conf. Learn. Representations ICLR 2024 (PMLR, Cambridge, MA).Google Scholar
- [31] (2020) Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions. IEEE Signal Processing Magazine 37(5):18–31.Crossref, Google Scholar
- [32] (2020) The global geometry of centralized and distributed low-rank matrix recovery without regularization. IEEE Signal Processing Lett. 27:1400–1404.Crossref, Google Scholar
- [33] (2020) Nonconvex robust low-rank matrix recovery. SIAM J. Optim. 30(1):660–686.Crossref, Google Scholar
- [34] (2014) An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans. Indust. Informatics 10(2):1273–1284.Crossref, Google Scholar
- [35] (2021) Sign-RIP: A robust restricted isometry property for low-rank matrix recovery. Preprint, submitted September 28, https://arxiv.org/abs/2102.02969.Google Scholar
- [36] (2023) Global convergence of sub-gradient method for robust matrix recovery: Small initialization, noisy measurements, and over-parameterization. J. Machine Learn. Res. 24(96):1–84.Google Scholar
- [37] (2024) Convergence of gradient descent with small initialization for unregularized matrix completion. Thirty-Seventh Annu. Conf. Learn. Theory (PMLR, Cambridge, MA), 3683–3742.Google Scholar
- [38] (2022) Sharp restricted isometry property bounds for low-rank matrix recovery problems with corrupted measurements. Proc. AAAI Conf. Artificial Intelligence, vol. 36 (IEEE, Piscataway, NJ), 7672–7681.Google Scholar
- [39] (2022) Role of sparsity and structure in the optimization landscape of non-convex matrix sensing. Math. Programming 193:75–111.Crossref, Google Scholar
- [40] (2009) Variational Analysis, vol. 317 (Springer Science & Business Media, Heidelberg).Google Scholar
- [41] (2025) Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing. IEEE Trans. Inform. Theory 71(4):2991–3037.Crossref, Google Scholar
- [42] (2021) Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. Adv. Neural Inform. Processing Systems 34:23831–23843.Google Scholar
- [43] (2019) Optimization for deep learning: Theory and algorithms. Preprint, submitted December 19, https://arxiv.org/abs/1912.08957.Google Scholar
- [44] (2020) Optimization for deep learning: An overview. J. Oper. Res. Soc. China 8(2):249–294.Crossref, Google Scholar
- [45] (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18(5):1131–1198.Crossref, Google Scholar
- [46] (1982) Nets of Grassmann manifold and orthogonal group. Proc. Res. Workshop Banach Space Theory Iowa City Iowa 1981, vol. 169 (University of Iowa, Iowa City, IA), 169–186.Google Scholar
- [47] (2021) Low-rank matrix recovery with scaled subgradient methods: Fast and robust convergence without the condition number. IEEE Trans. Signal Processing 69:2396–2409.Crossref, Google Scholar
- [48] (2014) Probability in high dimension. Technical report, Princeton University, Princeton, NJ.Google Scholar
- [49] (2010) Introduction to the non-asymptotic analysis of random matrices. Preprint, submitted November 12, https://arxiv.org/abs/1011.3027v1.Google Scholar
- [50] (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [51] (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [52] (2023) The power of preconditioning in overparameterized low-rank matrix sensing. Internat. Conf. Machine Learn. (PMLR, New York), 38611–38654.Google Scholar
- [53] (2021) Sharp global guarantees for nonconvex low-rank matrix recovery in the overparameterized regime. Preprint, submitted April 21, https://arxiv.org/abs/2104.10790v1.Google Scholar
- [54] (2024) Improved global guarantees for the nonconvex Burer–Monteiro factorization via rank overparameterization. Math. Program., 1–30.Crossref, Google Scholar
- [55] (2020) How many samples is a good initial point worth in low-rank matrix recovery? Adv. Neural Inform. Processing Systems 33:12583–12592.Google Scholar
- [56] (2023) Preconditioned gradient descent for overparameterized nonconvex Burer–Monteiro factorization with global optimality certification. J. Machine Learn. Res. 24(163):1–55.Google Scholar
- [57] (2019) Spurious local minima in power system state estimation. IEEE Trans. Control Network Systems 6(3):1086–1096.Crossref, Google Scholar
- [58] (2020) From symmetry to geometry: Tractable nonconvex problems. Preprint, submitted July 14, https://arxiv.org/abs/2007.06753v1.Google Scholar
- [59] (2018) Global optimality in low-rank matrix optimization. IEEE Trans. Signal Processing 66(13):3614–3628.Crossref, Google Scholar
- [60] (2021) The global optimization geometry of low-rank matrix optimization. IEEE Trans. Inform. Theory 67(2):1308–1331.Crossref, Google Scholar

