Efficiency of Stochastic Coordinate Proximal Gradient Methods on Nonseparable Composite Optimization

Published Online:https://doi.org/10.1287/moor.2023.0044

References

  • [1] Aberdam A, Beck A (2021) An accelerated coordinate gradient descent algorithm for non-separable composite optimization. J. Optim. Theory Appl. 193:219–246.CrossrefGoogle Scholar
  • [2] Beck A, Tetruashvili L (2013) On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4):2037–2060.CrossrefGoogle Scholar
  • [3] Bertsekas D (1999) Nonlinear Programming (Athena Scientific, Belmont, MA).Google Scholar
  • [4] Bolte J, Daniilidis A, Lewis A (2007) The Łojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 17(4):1205–1223.CrossrefGoogle Scholar
  • [5] Bonettini S (2021) Inexact block coordinate descent methods with application to non-negative matrix factorization. IMA J. Numerical Anal. 31(4):1431–1452.CrossrefGoogle Scholar
  • [6] Boucheron S, Lugosi G, Massart P (2012) Concentration Inequalities: A Nonasymptotic Theory of Independence (Clarendon Press, Oxford, UK).Google Scholar
  • [7] Carmon Y, Duchi JC (2019) Gradient descent efficiently finds the cubic regularized nonconvex Newton step. SIAM J. Optim. 29(3):2146–2178.CrossrefGoogle Scholar
  • [8] Cartis C, Roberts L (2023) Scalable subspace methods for derivative-free nonlinear least-squares optimization. Math. Programming 199:461–524.CrossrefGoogle Scholar
  • [9] Chung F, Lu L (2002) Connected components in random graphs with given expected degree sequences. Ann. Combin. 6:125–145.CrossrefGoogle Scholar
  • [10] Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans. Math. Software 38(1):1–25.CrossrefGoogle Scholar
  • [11] Fercoq O, Bianchi P (2019) A coordinate-descent primal-dual algorithm with large step size and possibly nonseparable functions. SIAM J. Optim. 29(1):100–134.CrossrefGoogle Scholar
  • [12] Fercoq O, Richtarik P (2015) Accelerated, parallel and proximal coordinate descent. SIAM J. Optim. 25(4):1997–2023.CrossrefGoogle Scholar
  • [13] Friedman J, Hastie T, Hofling H, Tibshirani R (2007) Pathwise coordinate optimization. Ann. Appl. Statist. 1(2):302–332.CrossrefGoogle Scholar
  • [14] Gratton S, Royer CW, Vicente LN, Zhang Z (2015) Direct search based on probabilistic descent. SIAM J. Optim. 25(3):1515–1541.CrossrefGoogle Scholar
  • [15] Grippo L, Sciandrone M (2000) On the convergence of the block nonlinear Gauss-Seidel method under convex constraints. Oper. Res. Lett. 26(3):127–136.CrossrefGoogle Scholar
  • [16] Grishchenko D, Iutzeler F, Malick J (2021) Proximal gradient methods with adaptive subspace sampling. Math. Oper. Res. 46(4):1303–1323.LinkGoogle Scholar
  • [17] Hanzely F, Mishchenko K, Richtarik P (2018) SEGA: Variance reduction via gradient sketching. Adv. Neural Inform. Processing Systems, vol. 31 (Curran Associates Inc., Red Hook, NY), 2082–2093.Google Scholar
  • [18] Kane DM, Nelson J (2014) Sparser Johnson-Lindenstrauss transforms. J. ACM 61(1):1–23.CrossrefGoogle Scholar
  • [19] Kozak D, Becker S, Doostan A, Tenorio L (2021) A stochastic subspace approach to gradient-free optimization in high dimensions. Comput. Optim. Appl. 79:339–368.CrossrefGoogle Scholar
  • [20] Latafat P, Themelis A, Patrinos P (2022) Block-coordinate and incremental aggregated proximal gradient methods for nonsmooth nonconvex problems. Math. Programming 193:195–224.CrossrefGoogle Scholar
  • [21] Lu Z, Xiao L (2015) On the complexity analysis of randomized block-coordinate descent methods. Math. Programming 152(1–2):615–642.CrossrefGoogle Scholar
  • [22] Lu H, Freund RM, Nesterov Yu (2018) Relatively smooth convex optimization by first-order methods and applications. SIAM J. Optim. 28(1):333–354.CrossrefGoogle Scholar
  • [23] Mahoney M (2016) Lecture notes on randomized linear algebra. Preprint, submitted August 16, https://arxiv.org/abs/1608.0448.Google Scholar
  • [24] Maulen-Soto R, Fadili J, Attouch H (2022) An SDE perspective on stochastic convex optimization. Preprint, submitted July 6, https://arxiv.org/abs/2207.02750.Google Scholar
  • [25] Mitchell T (1997) Machine Learning (McGraw Hill, New York).Google Scholar
  • [26] Necoara I (2013) Random coordinate descent algorithms for multi-agent convex optimization over networks. IEEE Trans. Automatic Control 58(8):2001–2012.CrossrefGoogle Scholar
  • [27] Necoara I, Clipici D (2013) Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: Application to distributed MPC. J. Process Control 23(3):243–253.CrossrefGoogle Scholar
  • [28] Necoara I, Clipici D (2016) Parallel random coordinate descent methods for composite minimization: Convergence analysis and error bounds. SIAM J. Optim. 26(1):197–226.CrossrefGoogle Scholar
  • [29] Necoara I, Patrascu A (2014) A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57:307–337.CrossrefGoogle Scholar
  • [30] Necoara I, Takac M (2021) Randomized sketch descent methods for non-separable linearly constrained optimization. IMA J. Numerical Anal. 41(2):1056–1092.CrossrefGoogle Scholar
  • [31] Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2):341–362.CrossrefGoogle Scholar
  • [32] Nesterov Y (2018) Lectures on Convex Optimization, 2nd ed. (Springer, Berlin).CrossrefGoogle Scholar
  • [33] Nesterov Y (2022) Inexact basic tensor methods for some classes of convex optimization problems. Optim. Methods Software 37(3):878–906.CrossrefGoogle Scholar
  • [34] Nesterov Y, Polyak BT (2006) Cubic regularization of Newton method and its global performance. Math. Programming 108:177–205.CrossrefGoogle Scholar
  • [35] Richtarik P, Takac M (2014) Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math. Programming 144:1–38.CrossrefGoogle Scholar
  • [36] Rudin W (1987) Real and Complex Analysis (McGraw-Hill, New York).Google Scholar
  • [37] Shao Z (2022) On random embeddings and their applications to optimization. Unpublished PhD thesis, University of Oxford, Oxford, UK.Google Scholar
  • [38] Tseng P, Yun S (2009) Block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140:513–535.CrossrefGoogle Scholar
  • [39] Wang Z (2020) The exact modulus of the generalized Kurdyka-Lojasiewicz property. Unpublished master’s thesis, The University of British Columbia, Vancouver.Google Scholar
  • [40] Wright SJ (2015) Coordinate descent algorithms. Math. Programming 151(1):3–34.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.