Can Learning Be Explained by Local Optimality in Robust Low-Rank Matrix Recovery?

Published Online:https://doi.org/10.1287/moor.2023.0371

References

  • [1] Bhojanapalli S, Neyshabur B, Srebro N (2016) Global optimality of local search for low rank matrix recovery. Adv. Neural Inform. Processing Systems 29.Google Scholar
  • [2] Bianchi P, Hachem W, Schechtman S (2024) Stochastic subgradient descent escapes active strict saddles on weakly convex functions. Math. Oper. Res. 49(3):1761–1790.LinkGoogle Scholar
  • [3] Bouwmans T, Zahzah EH (2014) Robust PCA via principal component pursuit: A review for a comparative evaluation in video surveillance. Comput. Vision Image Understanding 122:22–34.CrossrefGoogle Scholar
  • [4] Burer S, Monteiro RD (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Programming 95(2):329–357.CrossrefGoogle Scholar
  • [5] Burer S, Monteiro RD (2005) Local minima and convergence in low-rank semidefinite programming. Math. Programming 103(3):427–444.CrossrefGoogle Scholar
  • [6] Candès EJ, Plan Y (2011) Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory 57(4):2342–2359.CrossrefGoogle Scholar
  • [7] Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J. ACM 58(3):11.CrossrefGoogle Scholar
  • [8] Chandrasekaran V, Sanghavi S, Parrilo PA, Willsky AS (2011) Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2):572–596.CrossrefGoogle Scholar
  • [9] Charisopoulos V, Chen Y, Davis D, Díaz M, Ding L, Drusvyatskiy D (2021) Low-rank matrix recovery with composite optimization: Good conditioning and rapid convergence. Foundations Comput. Math. 21(6):1505–1593.CrossrefGoogle Scholar
  • [10] Chen Y (2015) Incoherence-optimal matrix completion. IEEE Trans. Inform. Theory 61(5):2909–2923.CrossrefGoogle Scholar
  • [11] Chen Y, Chi Y (2018) Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization. IEEE Signal Processing Magazine 35(4):14–31.CrossrefGoogle Scholar
  • [12] Chi Y, Lu YM, Chen Y (2019) Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Trans. Signal Processing 67(20):5239–5269.CrossrefGoogle Scholar
  • [13] Chung F, Lu L (2006) Concentration inequalities and martingale inequalities: A survey. Internet Math. 3(1):79–127.CrossrefGoogle Scholar
  • [14] Clarke FH (1975) Generalized gradients and applications. Trans. Amer. Math. Soc. 205:247–262.CrossrefGoogle Scholar
  • [15] Clarke FH (1990) Optimization and Nonsmooth Analysis (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [16] Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Adv. Neural Inform. Processing Systems 27.Google Scholar
  • [17] Davis D, Drusvyatskiy D (2022) Proximal methods avoid active strict saddles of weakly convex functions. Foundations Comput. Math. 22(2):561–606.CrossrefGoogle Scholar
  • [18] Davis D, Díaz M, Drusvyatskiy D (2022) Escaping strict saddle points of the Moreau envelope in nonsmooth optimization. SIAM J. Optim. 32(3):1958–1983.CrossrefGoogle Scholar
  • [19] Davis D, Drusvyatskiy D, Jiang L (2021) Subgradient methods near active manifolds: Saddle point avoidance, local convergence, and asymptotic normality. Preprint, submitted August 26, https://arxiv.org/abs/2108.11832v1.Google Scholar
  • [20] Ding L, Jiang L, Chen Y, Qu Q, Zhu Z (2021) Rank overspecified robust matrix recovery: Subgradient method and exact recovery. Adv. Neural Inform. Processing Systems 34:26767–26778.Google Scholar
  • [21] Fang C, Lin Z, Zhang T (2019) Sharp analysis for nonconvex SGD escaping from saddle points. Conf. Learn. Theory (PMLR, Cambridge, MA), 1192–1234.Google Scholar
  • [22] Fattahi S, Sojoudi S (2020) Exact guarantees on the absence of spurious local minima for non-negative rank-1 robust principal component analysis. J. Machine Learn. Res. 21:1–51.Google Scholar
  • [23] Ge R, Jin C, Zheng Y (2017) No spurious local minima in nonconvex low rank problems: A unified geometric analysis. Internat. Conf. Machine Learn. (PMLR, Cambridge, MA), 1233–1242.Google Scholar
  • [24] Ge R, Lee JD, Ma T (2016) Matrix completion has no spurious local minimum. Adv. Neural Inform. Processing Systems 29.Google Scholar
  • [25] Guillemin V, Pollack A (2010) Differential Topology, vol. 370 (American Mathematical Society, Providence, RI).Google Scholar
  • [26] Jin C, Ge R, Netrapalli P, Kakade SM, Jordan MI (2017) How to escape saddle points efficiently. Internat. Conf. Machine Learn. (PMLR, Cambridge, MA), 1724–1732.Google Scholar
  • [27] Josz C, Lai L (2022) Nonsmooth rank-one matrix factorization landscape. Optim. Lett. 16(6):1611–1631.CrossrefGoogle Scholar
  • [28] Ledoux M, Talagrand M (1991) Probability in Banach Spaces: Isoperimetry and Processes, vol. 23 (Springer Science & Business Media, Heidelberg).CrossrefGoogle Scholar
  • [29] Lee JD, Simchowitz M, Jordan MI, Recht B (2016) Gradient descent only converges to minimizers. Conf. Learn. Theory (PMLR, Cambridge, MA), 1246–1257.Google Scholar
  • [30] Lee N, Sreenivasan K, Lee JD, Lee K, Papailiopoulos D (2024) Teaching arithmetic to small transformers. Internat. Conf. Learn. Representations ICLR 2024 (PMLR, Cambridge, MA).Google Scholar
  • [31] Li J, So AMC, Ma WK (2020) Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions. IEEE Signal Processing Magazine 37(5):18–31.CrossrefGoogle Scholar
  • [32] Li S, Li Q, Zhu Z, Tang G, Wakin MB (2020) The global geometry of centralized and distributed low-rank matrix recovery without regularization. IEEE Signal Processing Lett. 27:1400–1404.CrossrefGoogle Scholar
  • [33] Li X, Zhu Z, Man-Cho So A, Vidal R (2020) Nonconvex robust low-rank matrix recovery. SIAM J. Optim. 30(1):660–686.CrossrefGoogle Scholar
  • [34] Luo X, Zhou M, Xia Y, Zhu Q (2014) An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans. Indust. Informatics 10(2):1273–1284.CrossrefGoogle Scholar
  • [35] Ma J, Fattahi S (2021) Sign-RIP: A robust restricted isometry property for low-rank matrix recovery. Preprint, submitted September 28, https://arxiv.org/abs/2102.02969.Google Scholar
  • [36] Ma J, Fattahi S (2023) Global convergence of sub-gradient method for robust matrix recovery: Small initialization, noisy measurements, and over-parameterization. J. Machine Learn. Res. 24(96):1–84.Google Scholar
  • [37] Ma J, Fattahi S (2024) Convergence of gradient descent with small initialization for unregularized matrix completion. Thirty-Seventh Annu. Conf. Learn. Theory (PMLR, Cambridge, MA), 3683–3742.Google Scholar
  • [38] Ma Z, Bi Y, Lavaei J, Sojoudi S (2022) Sharp restricted isometry property bounds for low-rank matrix recovery problems with corrupted measurements. Proc. AAAI Conf. Artificial Intelligence, vol. 36 (IEEE, Piscataway, NJ), 7672–7681.Google Scholar
  • [39] Molybog I, Sojoudi S, Lavaei J (2022) Role of sparsity and structure in the optimization landscape of non-convex matrix sensing. Math. Programming 193:75–111.CrossrefGoogle Scholar
  • [40] Rockafellar RT, Wets RJB (2009) Variational Analysis, vol. 317 (Springer Science & Business Media, Heidelberg).Google Scholar
  • [41] Soltanolkotabi M, Stöger D, Xie C (2025) Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing. IEEE Trans. Inform. Theory 71(4):2991–3037.CrossrefGoogle Scholar
  • [42] Stöger D, Soltanolkotabi M (2021) Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. Adv. Neural Inform. Processing Systems 34:23831–23843.Google Scholar
  • [43] Sun R (2019) Optimization for deep learning: Theory and algorithms. Preprint, submitted December 19, https://arxiv.org/abs/1912.08957.Google Scholar
  • [44] Sun RY (2020) Optimization for deep learning: An overview. J. Oper. Res. Soc. China 8(2):249–294.CrossrefGoogle Scholar
  • [45] Sun J, Qu Q, Wright J (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18(5):1131–1198.CrossrefGoogle Scholar
  • [46] Szarek SJ (1982) Nets of Grassmann manifold and orthogonal group. Proc. Res. Workshop Banach Space Theory Iowa City Iowa 1981, vol. 169 (University of Iowa, Iowa City, IA), 169–186.Google Scholar
  • [47] Tong T, Ma C, Chi Y (2021) Low-rank matrix recovery with scaled subgradient methods: Fast and robust convergence without the condition number. IEEE Trans. Signal Processing 69:2396–2409.CrossrefGoogle Scholar
  • [48] Van Handel R (2014) Probability in high dimension. Technical report, Princeton University, Princeton, NJ.Google Scholar
  • [49] Vershynin R (2010) Introduction to the non-asymptotic analysis of random matrices. Preprint, submitted November 12, https://arxiv.org/abs/1011.3027v1.Google Scholar
  • [50] Vershynin R (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [51] Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [52] Xu X, Shen Y, Chi Y, Ma C (2023) The power of preconditioning in overparameterized low-rank matrix sensing. Internat. Conf. Machine Learn. (PMLR, New York), 38611–38654.Google Scholar
  • [53] Zhang RY (2021) Sharp global guarantees for nonconvex low-rank matrix recovery in the overparameterized regime. Preprint, submitted April 21, https://arxiv.org/abs/2104.10790v1.Google Scholar
  • [54] Zhang RY (2024) Improved global guarantees for the nonconvex Burer–Monteiro factorization via rank overparameterization. Math. Program., 1–30.CrossrefGoogle Scholar
  • [55] Zhang J, Zhang R (2020) How many samples is a good initial point worth in low-rank matrix recovery? Adv. Neural Inform. Processing Systems 33:12583–12592.Google Scholar
  • [56] Zhang G, Fattahi S, Zhang RY (2023) Preconditioned gradient descent for overparameterized nonconvex Burer–Monteiro factorization with global optimality certification. J. Machine Learn. Res. 24(163):1–55.Google Scholar
  • [57] Zhang RY, Lavaei J, Baldick R (2019) Spurious local minima in power system state estimation. IEEE Trans. Control Network Systems 6(3):1086–1096.CrossrefGoogle Scholar
  • [58] Zhang Y, Qu Q, Wright J (2020) From symmetry to geometry: Tractable nonconvex problems. Preprint, submitted July 14, https://arxiv.org/abs/2007.06753v1.Google Scholar
  • [59] Zhu Z, Li Q, Tang G, Wakin MB (2018) Global optimality in low-rank matrix optimization. IEEE Trans. Signal Processing 66(13):3614–3628.CrossrefGoogle Scholar
  • [60] Zhu Z, Li Q, Tang G, Wakin MB (2021) The global optimization geometry of low-rank matrix optimization. IEEE Trans. Inform. Theory 67(2):1308–1331.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.