On the Efficient Implementation of the Matrix Exponentiated Gradient Algorithm for Low-Rank Matrix Optimization

Dan Garber
Corresponding Author
Dan Garber
[email protected]
https://orcid.org/0000-0002-5181-9193
Technion–Israel Institute of Technology, Haifa 3200003, Israel
Search for more papers by this author
,
Atara Kaplan
Atara Kaplan
[email protected]
Technion–Israel Institute of Technology, Haifa 3200003, Israel
Search for more papers by this author

Dan Garber

Corresponding Author

Dan Garber

[email protected]

https://orcid.org/0000-0002-5181-9193

Technion–Israel Institute of Technology, Haifa 3200003, Israel

Search for more papers by this author

Atara Kaplan

[email protected]

Technion–Israel Institute of Technology, Haifa 3200003, Israel

Search for more papers by this author

Published Online:7 Dec 2022https://doi.org/10.1287/moor.2022.1332

References

[1] Allen-Zhu Z, Li Y (2017) Follow the compressed leader: Faster online learning of eigenvectors and faster MMWU. Doina P, Yee WT, eds. Proc. 34th Internat. Conf. Machine Learn., vol. 70 (JMLR), 116–125.Google Scholar
[2] Allen-Zhu Z, Lee YT, Orecchia L (2016) Using optimization to obtain a width-independent, parallel, simpler, and faster positive SDP solver. Krauthgamer R, ed. Proc. 27th Annual ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 1824–1831.Google Scholar
[3] Alman J, Williams VV (2021) A refined laser method and faster matrix multiplication. Marx D, ed. Proc. 2021 ACM-SIAM Sympos. Discrete Algorithms (Society for Industrial and Applied Mathematics, Philadelphia), 522–539.Google Scholar
[4] Beck A (2017) First-Order Methods in Optimization (Society for Industrial and Applied Mathematics, Philadelphia).Crossref, Google Scholar
[5] Beck A, Teboulle M (2003) Mirror descent and nonlinear projected subgradient methods for convex optimization. Oper. Res. Lett. 31(3):167–175.Crossref, Google Scholar
[6] Bubeck S (2015) Convex optimization: Algorithms and complexity. Foundations Trends Machine Learn. 8(3-4):231–357.Crossref, Google Scholar
[7] Candès EJ, Recht B (2009) Exact matrix completion via convex optimization. Foundations Comput. Math. 9(6):717–772.Crossref, Google Scholar
[8] Candès EJ, Eldar YC, Strohmer T, Voroninski V (2015) Phase retrieval via matrix completion. SIAM Rev. 57(2):225–251.Crossref, Google Scholar
[9] Candès EJ, Li X, Ma Y, Wright J (2009) Robust principal component analysis? J. ACM 58(3):1–37.Crossref, Google Scholar
[10] Carmon Y, Duchi JC, Sidford A, Tian K (2019) A Rank-1 sketch for matrix multiplicative weights (COLT). Alina B, Daniel H, eds. Proc. 32nd Conf. Learn. Theory, vol. 99, Proceedings of Machine Learning Research Series (PMLR), 589–623.Google Scholar
[11] Chung F, Lu L (2006) Concentration inequalities and martingale inequalities: A survey. Internet Math. 3(1):79–127.Crossref, Google Scholar
[12] Ding L, Fei Y, Xu Q, Yang C (2020) Spectral Frank-Wolfe algorithm: Strict complementarity and linear convergence (ICML). Daumé III, Aarti S, eds. Proc. 37th Internat. Conf. Machine Learn., vol, 119, Proceedings of Machine Learning Research Series (PMLR), 2535–2544.Google Scholar
[13] Drusvyatskiy D, Lewis AS (2018) Error bounds, quadratic growth, and linear convergence of proximal methods. Math. Oper. Res. 43(3):919–948.Link, Google Scholar
[14] Garber D (2019) On the convergence of projected-gradient methods with low-rank projections for smooth convex minimization over trace-norm balls and related problems. SIAM J. Optim. 31(1):727–753.Crossref, Google Scholar
[15] Garber D (2020) On the convergence of stochastic gradient descent with low-rank projections for convex low-rank matrix problems. Abernethy JD, Agarwal S, eds. Conf. Learn. Theory (COLT), vol. 125, Proceedings of Machine Learning Research Series (PMLR), 1666–1681.Google Scholar
[16] Garber D (2022) Linear convergence of Frank-Wolfe for rank-one matrix recovery without strong convexity. Math. Programming A. https://doi.org/10.1007/s10107-022-01821-8.Google Scholar
[17] Ge R, Lee JD, Ma T (2016) Matrix completion has no spurious local minimum. Lee D, Sugiyama M, Luxburg U, Guyon I, Garnett R, eds. Advances in Neural Information Processing Systems, vol. 29 (Curran Associates Inc., Red Hook, NY), 2973–2981.Google Scholar
[18] Golub GH, Van Loan CF (2013) Matrix Computations (JHU Press, Baltimore, MD).Crossref, Google Scholar
[19] Horn RA, Johnson CR (1994) Topics in Matrix Analysis (Cambridge University Press, Cambridge, UK).Google Scholar
[20] Jaggi M, Sulovský M (2010) A simple algorithm for nuclear norm regularized problems. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Internat. Conf. Machine Learn. (Omnipress, Madison, WI), 471–478.Google Scholar
[21] Jia Z, Stewart G (2001) An analysis of the Rayleigh–Ritz method for approximating eigenspaces. Math. Comput. 70(234):637–647.Crossref, Google Scholar
[22] Marshall AW, Olkin I, Arnold BC (2011) Inequalities: Theory of Majorization and Its Applications, vol. 143, 2nd ed. (Springer, New York).Crossref, Google Scholar
[23] Mu C, Zhang Y, Wright J, Goldfarb D (2016) Scalable robust matrix recovery: Frank-Wolfe meets proximal methods. SIAM J. Sci. Comput. 38(5):A3291–A3317.Crossref, Google Scholar
[24] Musco C, Musco C (2015) Randomized block Krylov methods for stronger and faster approximate singular value decomposition. Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, eds. Proc. 28th Internat. Conf. Neural Inform. Processing Systems, vol. 1 (MIT Press, Cambridge, MA), 1396–1404.Google Scholar
[25] Musco C, Musco C, Sidford A (2018) Stability of the Lanczos method for matrix function approximation. Czumaj A, ed. Proc. 29th Annual ACM-SIAM Sympos. Discrete Algorithms, SODA’18 (Society for Industrial and Applied Mathematics, Philadelphia), 1605–1624.Google Scholar
[26] Netrapalli P, Jain P, Sanghavi S (2013) Phase retrieval using alternating minimization. Burges CJ, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 26 (Curran Associates, Inc., Red Hook, NY), 2796–2804.Google Scholar
[27] Netrapalli P, Niranjan U, Sanghavi S, Anandkumar A, Jain P (2014) Non-convex robust PCA. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 27 (Curran Associates, Inc., Red Hook, NY), 1107–1115.Google Scholar
[28] Peng R, Tangwongsan K (2012) Faster and simpler width-independent parallel algorithms for positive semidefinite programming. Blelloch GE, Herlihy M, eds. Proc. 24th Annual ACM Sympos. Parallelism Algorithms Architectures (Association for Computing Machinery, New York), 101–108.Google Scholar
[29] Recht B (2011) A simpler approach to matrix completion. J. Machine Learn. Res. 12:3413–3430.Google Scholar
[30] Saad Y (2011) Numerical Methods for Large Eigenvalue Problems, rev. ed. (Manchester University Press, Manchester, UK).Crossref, Google Scholar
[31] Thompson CJ (1965) Inequality with applications in statistical mechanics. J. Math. Phys. 6(11):1812–1813.Crossref, Google Scholar
[32] Tropp JA (2012) User-friendly tail bounds for sums of random matrices. Foundations Comput. Math. 12(4):389–434.Crossref, Google Scholar
[33] Tsuda K, Rätsch G, Warmuth MK (2005) Matrix exponentiated gradient updates for on-line learning and Bregman projection. J. Machine Learn. Res. 6(34):995–1018.Google Scholar
[34] Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. Bengio Y, Schuurmans D, Lafferty J, Williams C, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 22 (Curran Associates, Inc., Red Hook, NY), 2080–2088.Google Scholar
[35] Yi P, Park D, Chen Y, Caramanis C (2016) Fast algorithms for robust PCA via gradient descent. Lee DD, von Luxburg U, eds. Adv. Neural Inform. Processing Systems, vol. 30 (Curran Associates Inc., Red Hook, NY), 4159–4167.Google Scholar
[36] Yu YL (2013) The strong convexity of von Neumann’s entropy. Unpublished manuscript.Google Scholar
[37] Yurtsever A, Udell M, Tropp JA, Cevher V (2017) Sketchy decisions: Convex low-rank matrix optimization with optimal storage. Sangal R, Mehta H, Bagga RK, eds. Proc. 20th Internat. Conf. Artificial Intelligence Statist., vol. 54 (Morgan Kaufmann Publishers Inc., San Francisco), 1188–1196.Google Scholar
[38] Yurtsever A, Tropp J, Fercoq O, Udell M, Cevher V (2021) Scalable semidefinite programming. SIAM J. Math. Data Sci. 3(1):171–200.Crossref, Google Scholar
[39] Zhou Z, So AMC (2017) A unified approach to error bounds for structured convex optimization problems. Math. Programming 165(2):689–728.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 48, Issue 4

November 2023

Pages 1811-2382, C2

Article Information

Metrics

Information

Received:December 17, 2020
Accepted:September 28, 2022
Published Online:December 07, 2022

Cite as

Dan Garber, Atara Kaplan (2022) On the Efficient Implementation of the Matrix Exponentiated Gradient Algorithm for Low-Rank Matrix Optimization. Mathematics of Operations Research 48(4):2094-2128.

https://doi.org/10.1287/moor.2022.1332

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On the Efficient Implementation of the Matrix Exponentiated Gradient Algorithm for Low-Rank Matrix Optimization

References

Volume 48, Issue 4

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News