Can Learning Be Explained by Local Optimality in Robust Low-Rank Matrix Recovery?

Jianhao Ma
Jianhao Ma
[email protected]
https://orcid.org/0000-0003-1440-6645
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109
Search for more papers by this author
,
Salar Fattahi
Corresponding Author
Salar Fattahi
[email protected]
https://orcid.org/0000-0001-7986-3148
Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109
Search for more papers by this author

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109

Search for more papers by this author

Salar Fattahi

Corresponding Author

Salar Fattahi

[email protected]

https://orcid.org/0000-0001-7986-3148

Department of Industrial and Operations Engineering, University of Michigan, Ann Arbor, Michigan 48109

Search for more papers by this author

Published Online:23 Jul 2025https://doi.org/10.1287/moor.2023.0371

References

[1] Bhojanapalli S, Neyshabur B, Srebro N (2016) Global optimality of local search for low rank matrix recovery. Adv. Neural Inform. Processing Systems 29.Google Scholar
[2] Bianchi P, Hachem W, Schechtman S (2024) Stochastic subgradient descent escapes active strict saddles on weakly convex functions. Math. Oper. Res. 49(3):1761–1790.Link, Google Scholar
[3] Bouwmans T, Zahzah EH (2014) Robust PCA via principal component pursuit: A review for a comparative evaluation in video surveillance. Comput. Vision Image Understanding 122:22–34.Crossref, Google Scholar
[4] Burer S, Monteiro RD (2003) A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization. Math. Programming 95(2):329–357.Crossref, Google Scholar
[5] Burer S, Monteiro RD (2005) Local minima and convergence in low-rank semidefinite programming. Math. Programming 103(3):427–444.Crossref, Google Scholar
[6] Candès EJ, Plan Y (2011) Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. IEEE Trans. Inform. Theory 57(4):2342–2359.Crossref, Google Scholar
[7] Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J. ACM 58(3):11.Crossref, Google Scholar
[8] Chandrasekaran V, Sanghavi S, Parrilo PA, Willsky AS (2011) Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21(2):572–596.Crossref, Google Scholar
[9] Charisopoulos V, Chen Y, Davis D, Díaz M, Ding L, Drusvyatskiy D (2021) Low-rank matrix recovery with composite optimization: Good conditioning and rapid convergence. Foundations Comput. Math. 21(6):1505–1593.Crossref, Google Scholar
[10] Chen Y (2015) Incoherence-optimal matrix completion. IEEE Trans. Inform. Theory 61(5):2909–2923.Crossref, Google Scholar
[11] Chen Y, Chi Y (2018) Harnessing structures in big data via guaranteed low-rank matrix estimation: Recent theory and fast algorithms via convex and nonconvex optimization. IEEE Signal Processing Magazine 35(4):14–31.Crossref, Google Scholar
[12] Chi Y, Lu YM, Chen Y (2019) Nonconvex optimization meets low-rank matrix factorization: An overview. IEEE Trans. Signal Processing 67(20):5239–5269.Crossref, Google Scholar
[13] Chung F, Lu L (2006) Concentration inequalities and martingale inequalities: A survey. Internet Math. 3(1):79–127.Crossref, Google Scholar
[14] Clarke FH (1975) Generalized gradients and applications. Trans. Amer. Math. Soc. 205:247–262.Crossref, Google Scholar
[15] Clarke FH (1990) Optimization and Nonsmooth Analysis (SIAM, Philadelphia).Crossref, Google Scholar
[16] Dauphin YN, Pascanu R, Gulcehre C, Cho K, Ganguli S, Bengio Y (2014) Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. Adv. Neural Inform. Processing Systems 27.Google Scholar
[17] Davis D, Drusvyatskiy D (2022) Proximal methods avoid active strict saddles of weakly convex functions. Foundations Comput. Math. 22(2):561–606.Crossref, Google Scholar
[18] Davis D, Díaz M, Drusvyatskiy D (2022) Escaping strict saddle points of the Moreau envelope in nonsmooth optimization. SIAM J. Optim. 32(3):1958–1983.Crossref, Google Scholar
[19] Davis D, Drusvyatskiy D, Jiang L (2021) Subgradient methods near active manifolds: Saddle point avoidance, local convergence, and asymptotic normality. Preprint, submitted August 26, https://arxiv.org/abs/2108.11832v1.Google Scholar
[20] Ding L, Jiang L, Chen Y, Qu Q, Zhu Z (2021) Rank overspecified robust matrix recovery: Subgradient method and exact recovery. Adv. Neural Inform. Processing Systems 34:26767–26778.Google Scholar
[21] Fang C, Lin Z, Zhang T (2019) Sharp analysis for nonconvex SGD escaping from saddle points. Conf. Learn. Theory (PMLR, Cambridge, MA), 1192–1234.Google Scholar
[22] Fattahi S, Sojoudi S (2020) Exact guarantees on the absence of spurious local minima for non-negative rank-1 robust principal component analysis. J. Machine Learn. Res. 21:1–51.Google Scholar
[23] Ge R, Jin C, Zheng Y (2017) No spurious local minima in nonconvex low rank problems: A unified geometric analysis. Internat. Conf. Machine Learn. (PMLR, Cambridge, MA), 1233–1242.Google Scholar
[24] Ge R, Lee JD, Ma T (2016) Matrix completion has no spurious local minimum. Adv. Neural Inform. Processing Systems 29.Google Scholar
[25] Guillemin V, Pollack A (2010) Differential Topology, vol. 370 (American Mathematical Society, Providence, RI).Google Scholar
[26] Jin C, Ge R, Netrapalli P, Kakade SM, Jordan MI (2017) How to escape saddle points efficiently. Internat. Conf. Machine Learn. (PMLR, Cambridge, MA), 1724–1732.Google Scholar
[27] Josz C, Lai L (2022) Nonsmooth rank-one matrix factorization landscape. Optim. Lett. 16(6):1611–1631.Crossref, Google Scholar
[28] Ledoux M, Talagrand M (1991) Probability in Banach Spaces: Isoperimetry and Processes, vol. 23 (Springer Science & Business Media, Heidelberg).Crossref, Google Scholar
[29] Lee JD, Simchowitz M, Jordan MI, Recht B (2016) Gradient descent only converges to minimizers. Conf. Learn. Theory (PMLR, Cambridge, MA), 1246–1257.Google Scholar
[30] Lee N, Sreenivasan K, Lee JD, Lee K, Papailiopoulos D (2024) Teaching arithmetic to small transformers. Internat. Conf. Learn. Representations ICLR 2024 (PMLR, Cambridge, MA).Google Scholar
[31] Li J, So AMC, Ma WK (2020) Understanding notions of stationarity in nonsmooth optimization: A guided tour of various constructions of subdifferential for nonsmooth functions. IEEE Signal Processing Magazine 37(5):18–31.Crossref, Google Scholar
[32] Li S, Li Q, Zhu Z, Tang G, Wakin MB (2020) The global geometry of centralized and distributed low-rank matrix recovery without regularization. IEEE Signal Processing Lett. 27:1400–1404.Crossref, Google Scholar
[33] Li X, Zhu Z, Man-Cho So A, Vidal R (2020) Nonconvex robust low-rank matrix recovery. SIAM J. Optim. 30(1):660–686.Crossref, Google Scholar
[34] Luo X, Zhou M, Xia Y, Zhu Q (2014) An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans. Indust. Informatics 10(2):1273–1284.Crossref, Google Scholar
[35] Ma J, Fattahi S (2021) Sign-RIP: A robust restricted isometry property for low-rank matrix recovery. Preprint, submitted September 28, https://arxiv.org/abs/2102.02969.Google Scholar
[36] Ma J, Fattahi S (2023) Global convergence of sub-gradient method for robust matrix recovery: Small initialization, noisy measurements, and over-parameterization. J. Machine Learn. Res. 24(96):1–84.Google Scholar
[37] Ma J, Fattahi S (2024) Convergence of gradient descent with small initialization for unregularized matrix completion. Thirty-Seventh Annu. Conf. Learn. Theory (PMLR, Cambridge, MA), 3683–3742.Google Scholar
[38] Ma Z, Bi Y, Lavaei J, Sojoudi S (2022) Sharp restricted isometry property bounds for low-rank matrix recovery problems with corrupted measurements. Proc. AAAI Conf. Artificial Intelligence, vol. 36 (IEEE, Piscataway, NJ), 7672–7681.Google Scholar
[39] Molybog I, Sojoudi S, Lavaei J (2022) Role of sparsity and structure in the optimization landscape of non-convex matrix sensing. Math. Programming 193:75–111.Crossref, Google Scholar
[40] Rockafellar RT, Wets RJB (2009) Variational Analysis, vol. 317 (Springer Science & Business Media, Heidelberg).Google Scholar
[41] Soltanolkotabi M, Stöger D, Xie C (2025) Implicit balancing and regularization: Generalization and convergence guarantees for overparameterized asymmetric matrix sensing. IEEE Trans. Inform. Theory 71(4):2991–3037.Crossref, Google Scholar
[42] Stöger D, Soltanolkotabi M (2021) Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction. Adv. Neural Inform. Processing Systems 34:23831–23843.Google Scholar
[43] Sun R (2019) Optimization for deep learning: Theory and algorithms. Preprint, submitted December 19, https://arxiv.org/abs/1912.08957.Google Scholar
[44] Sun RY (2020) Optimization for deep learning: An overview. J. Oper. Res. Soc. China 8(2):249–294.Crossref, Google Scholar
[45] Sun J, Qu Q, Wright J (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18(5):1131–1198.Crossref, Google Scholar
[46] Szarek SJ (1982) Nets of Grassmann manifold and orthogonal group. Proc. Res. Workshop Banach Space Theory Iowa City Iowa 1981, vol. 169 (University of Iowa, Iowa City, IA), 169–186.Google Scholar
[47] Tong T, Ma C, Chi Y (2021) Low-rank matrix recovery with scaled subgradient methods: Fast and robust convergence without the condition number. IEEE Trans. Signal Processing 69:2396–2409.Crossref, Google Scholar
[48] Van Handel R (2014) Probability in high dimension. Technical report, Princeton University, Princeton, NJ.Google Scholar
[49] Vershynin R (2010) Introduction to the non-asymptotic analysis of random matrices. Preprint, submitted November 12, https://arxiv.org/abs/1011.3027v1.Google Scholar
[50] Vershynin R (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[51] Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[52] Xu X, Shen Y, Chi Y, Ma C (2023) The power of preconditioning in overparameterized low-rank matrix sensing. Internat. Conf. Machine Learn. (PMLR, New York), 38611–38654.Google Scholar
[53] Zhang RY (2021) Sharp global guarantees for nonconvex low-rank matrix recovery in the overparameterized regime. Preprint, submitted April 21, https://arxiv.org/abs/2104.10790v1.Google Scholar
[54] Zhang RY (2024) Improved global guarantees for the nonconvex Burer–Monteiro factorization via rank overparameterization. Math. Program., 1–30.Crossref, Google Scholar
[55] Zhang J, Zhang R (2020) How many samples is a good initial point worth in low-rank matrix recovery? Adv. Neural Inform. Processing Systems 33:12583–12592.Google Scholar
[56] Zhang G, Fattahi S, Zhang RY (2023) Preconditioned gradient descent for overparameterized nonconvex Burer–Monteiro factorization with global optimality certification. J. Machine Learn. Res. 24(163):1–55.Google Scholar
[57] Zhang RY, Lavaei J, Baldick R (2019) Spurious local minima in power system state estimation. IEEE Trans. Control Network Systems 6(3):1086–1096.Crossref, Google Scholar
[58] Zhang Y, Qu Q, Wright J (2020) From symmetry to geometry: Tractable nonconvex problems. Preprint, submitted July 14, https://arxiv.org/abs/2007.06753v1.Google Scholar
[59] Zhu Z, Li Q, Tang G, Wakin MB (2018) Global optimality in low-rank matrix optimization. IEEE Trans. Signal Processing 66(13):3614–3628.Crossref, Google Scholar
[60] Zhu Z, Li Q, Tang G, Wakin MB (2021) The global optimization geometry of low-rank matrix optimization. IEEE Trans. Inform. Theory 67(2):1308–1331.Crossref, Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:December 10, 2023
Accepted:June 01, 2025
Published Online:July 23, 2025

Cite as

Jianhao Ma, Salar Fattahi (2025) Can Learning Be Explained by Local Optimality in Robust Low-Rank Matrix Recovery?. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2023.0371

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Can Learning Be Explained by Local Optimality in Robust Low-Rank Matrix Recovery?

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News