A Superlinearly Convergent Subgradient Method for Sharp Semismooth Problems

Published Online:https://doi.org/10.1287/moor.2023.1390

References

  • [1] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, et al. (2016) Tensorflow: A system for large-scale machine learning. 12th USENIX Sympos. Operating Systems Design Implementation (OSDI 16) (USENIX Association, Savannah, GA), 265–283.Google Scholar
  • [2] Balan R, Casazza P, Edidin D (2006) On signal reconstruction without phase. Appl. Comput. Harmonic Anal. 20(3):345–356.CrossrefGoogle Scholar
  • [3] Banach S (1922) Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fundamenta Mathematicae 3(1):133–181.CrossrefGoogle Scholar
  • [4] Bauschke HH, Noll D, Phan HM (2015) Linear and strong convergence of algorithms involving averaged nonexpansive operators. J. Math. Anal. Appl. 421(1):1–20.CrossrefGoogle Scholar
  • [5] Bolte J, Pauwels E (2021) Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Math. Programming 188(1):19–51.CrossrefGoogle Scholar
  • [6] Bolte J, Daniilidis A, Lewis A (2009) Tame functions are semismooth. Math. Programming 117(1):5–19.Google Scholar
  • [7] Boumal N (2023) An Introduction to Optimization on Smooth Manifolds (Cambridge University Press, Cambridge).Google Scholar
  • [8] Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Trans. Inform. Theory 51(12):4203–4215.CrossrefGoogle Scholar
  • [9] Candes E, Tao T (2006) Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52(12):5406–5425.Google Scholar
  • [10] Charisopoulos V, Chen Y, Davis D, Díaz M, Ding L, Drusvyatskiy D (2021) Low-rank matrix recovery with composite optimization: Good conditioning and rapid convergence. Foundations Comput. Math. 21:1505–1593.Google Scholar
  • [11] Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev. 43(1):129–159.CrossrefGoogle Scholar
  • [12] Conca A, Edidin D, Hering M, Vinzant C (2015) An algebraic characterization of injectivity in phase retrieval. Appl. Comput. Harmonic Anal. 38(2):346–356.CrossrefGoogle Scholar
  • [13] Davis D, Drusvyatskiy D (2021) Conservative and semismooth derivatives are equivalent for semialgebraic maps. Set-Valued Variational Anal. 30:453–463.Google Scholar
  • [14] Davis D, Drusvyatskiy D, Jiang L (2021) Subgradient methods near active manifolds: Saddle point avoidance, local convergence, and asymptotic normality. Preprint, submitted August 26, https://doi.org/10.48550/arXiv.2108.11832.Google Scholar
  • [15] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119–154.CrossrefGoogle Scholar
  • [16] Davis D, Drusvyatskiy D, MacPhee KJ, Paquette C (2018) Subgradient methods for sharp weakly convex functions. J. Optim. Theory Appl. 179(3):962–982.Google Scholar
  • [17] Díaz M, Grimmer B (2023) Optimal convergence rates for the proximal bundle method. 33(2). https://epubs.siam.org/doi/10.1137/21M1428601.Google Scholar
  • [18] Donoho DL (2006) Compressed sensing. IEEE Trans. Inform. Theory 52(4):1289–1306.CrossrefGoogle Scholar
  • [19] Drusvyatskiy D (2013) Slope and geometry in variational mathematics. PhD thesis, Cornell University, Ithaca, NY.Google Scholar
  • [20] Drusvyatskiy D, Ioffe AD, Lewis AS (2015) Curves of descent. SIAM J. Control Optim. 53(1):114–138.CrossrefGoogle Scholar
  • [21] Drusvyatskiy D, Ioffe AD, Lewis AS (2015) Transversality and alternating projections for nonconvex sets. Foundations Comput. Math. 15(6):1637–1651.CrossrefGoogle Scholar
  • [22] Du Y, Ruszczyński A (2017) Rate of convergence of the bundle method. J. Optim. Theory Appl. 173(3):908–922.CrossrefGoogle Scholar
  • [23] Duchi JC, Ruan F (2018) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.Google Scholar
  • [24] Emiel G, Sagastizábal C (2010) Incremental-like bundle methods with application to energy planning. Comput. Optim. Appl. 46(2):305–332.CrossrefGoogle Scholar
  • [25] Eremin I (1965) The relaxation method of solving systems of inequalities with convex functions on the left-hand side. Proc. USSR Acad. Sci. 160:994–996.Google Scholar
  • [26] Ermol’ev YM, Norkin V (1998) Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization. Cybernetics Systems Anal. 34(2):196–215.CrossrefGoogle Scholar
  • [27] Facchinei F, Pang JS (2007) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer Science & Business Media, New York).Google Scholar
  • [28] Facchinei F, Fischer A, Herrich M (2014) An LP-Newton method: Nonsmooth equations, KKT systems, and nonisolated solutions. Math. Programming 146(1):1–36.CrossrefGoogle Scholar
  • [29] Goffin J (1977) On convergence rates of subgradient optimization methods. Math. Programming 13(3):329–347.Google Scholar
  • [30] Golub GH, Van Loan CF (2013) Matrix Computations. 4th ed. (Johns Hopkins University Press, Baltimore).Google Scholar
  • [31] Guntuboyina A (2012) Optimal rates of convergence for convex set estimation from support functions. Ann. Statist. 40(1):385–411.Google Scholar
  • [32] Hare W, Sagastizábal C (2010) A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20(5):2442–2473.CrossrefGoogle Scholar
  • [33] Hoffman AJ (1952) On approximate solutions of systems of linear inequalities. J. Res. National Bureau Standards 49(4):263–265.CrossrefGoogle Scholar
  • [34] Ioffe AD (2017) Variational Analysis of Regular Mappings: Theory and Applications, Springer Monographs in Mathematics (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • [35] Izmailov AF, Solodov MV (2014) Newton-Type Methods for Optimization and Variational Problems (Springer, Berlin).CrossrefGoogle Scholar
  • [36] Johnstone PR, Moulin P (2020) Faster subgradient methods for functions with Hölderian growth. Math. Programming 180(1):417–450.Google Scholar
  • [37] Kiwiel KC (1985) A linearization algorithm for nonsmooth minimization. Math. Oper. Res. 10(2):185–194.LinkGoogle Scholar
  • [38] Kiwiel KC (2000) Efficiency of proximal bundle methods. J. Optim. Theory Appl. 104(3):589–603.CrossrefGoogle Scholar
  • [39] Klatte D, Kummer B (2006) Nonsmooth Equations in Optimization: Regularity, Calculus, Methods and Applications, vol. 60 (Springer Science & Business Media, New York).Google Scholar
  • [40] Krasnosel’skii MA (1955) Two comments on the method of successive approximations. Uspekhi Matematicheskikh Nauk 10:123–127.Google Scholar
  • [41] Kummer B (1988) Newton’s method for non-differentiable functions. Math. Res. 45:114–125.Google Scholar
  • [42] Lee JM (2013) Smooth manifolds. Introduction to Smooth Manifolds (Springer, Berlin), 1–31.CrossrefGoogle Scholar
  • [43] Lemarechal C (1975) An extension of davidon methods to nondifferentiable problems. Balinski ML, Wolfe P, eds. Nondifferentiable Optimization, Mathematical Programming Studies, vol. 3 (Springer, Berlin), 95–109.CrossrefGoogle Scholar
  • [44] Lewis A, Tian T (2021) The structure of conservative gradient fields. 31(3). https://epubs.siam.org/doi/10.1137/21M1393637.Google Scholar
  • [45] Lewis AS, Malick J (2008) Alternating projections on manifolds. Math. Oper. Res. 33(1):216–234.Google Scholar
  • [46] Liang J, Monteiro RD (2021) A proximal bundle variant with optimal iteration-complexity for a large range of prox stepsizes. SIAM J. Optim. 31(4):2955–2986.CrossrefGoogle Scholar
  • [47] Mann WR (1953) Mean value methods in iteration. Proc. Amer. Math. Soc. 4(3):506–510.CrossrefGoogle Scholar
  • [48] Mifflin R (1977) Semismooth and semiconvex functions in constrained optimization. SIAM J. Optim. 15(6):959–972.CrossrefGoogle Scholar
  • [49] Mifflin R, Sagastizábal C (2005) A VU-algorithm for convex minimization. Math. Programming 104(2):583–608.CrossrefGoogle Scholar
  • [50] Norkin VI (1986) Stochastic generalized-differentiable functions in the problem of nonconvex nonsmooth stochastic optimization. Cybernetics 22(6):804–809.CrossrefGoogle Scholar
  • [51] Nurminskii EA (1973) The quasigradient method for the solving of the nonlinear programming problems. Cybernetics 9(1):145–150.CrossrefGoogle Scholar
  • [52] Nurminskii EA (1974) Minimization of nondifferentiable functions in the presence of noise. Cybernetics 10(4):619–621.CrossrefGoogle Scholar
  • [53] Oliveira Wd, Sagastizábal C (2014) Bundle methods in the XXIst century: A bird’s-eye view. Pesquisa Operacional 34(3):647–670.CrossrefGoogle Scholar
  • [54] Pang CHJ (2015) Nonconvex set intersection problems: From projection methods to the Newton method for super-regular sets. Preprint, submitted June 27, https://doi.org/10.48550/arXiv.1506.08246.Google Scholar
  • [55] Pang CJ (2015) Set intersection problems: Supporting hyperplanes and quadratic programming. Math. Programming 149(1):329–359.CrossrefGoogle Scholar
  • [56] Pang JS (1997) Error bounds in mathematical programming. Math. Programming 79(1–3):299–332.CrossrefGoogle Scholar
  • [57] Polyak BT (1969) Minimization of unsmooth functionals. USSR Comput. Math. Math. Physics 9(3):14–29.CrossrefGoogle Scholar
  • [58] Polyak BT (1978) Subgradient methods: A survey of Soviet research. Nonsmoothe Optim. Proc. IIASA Workshop, vol. 3 (Pergamon, Oxford, UK), 5–29.Google Scholar
  • [59] Prince JL, Willsky AS (1990) Reconstructing convex sets from support line measurements. IEEE Trans. Pattern Anal. Machine Intelligence 12(4):377–389.CrossrefGoogle Scholar
  • [60] Qi L (1993) Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18(1):227–244.LinkGoogle Scholar
  • [61] Qi L, Sun J (1993) A nonsmooth version of Newton’s method. Math. Programming 58(1):353–367.CrossrefGoogle Scholar
  • [62] Rockafellar R, Wets RB (1998) Variational Analysis, Grundlehren der Mathematischen Wissenschaften, vol. 317 (Springer, Berlin).CrossrefGoogle Scholar
  • [63] Sagastizábal C (2012) Divide to conquer: Decomposition methods for energy optimization. Math. Programming 134(1):187–222.CrossrefGoogle Scholar
  • [64] Schreiber R, Van Loan C (1989) A storage-efficient WY representation for products of householder transformations. SIAM J. Sci. Statist. Comput. 10(1):53–57.Google Scholar
  • [65] Shor N (1970) The rate of convergence of the method of the generalized gradient descent with expansion of space. Kibernetika (Kiev) 2:80–85.Google Scholar
  • [66] Supittayapornpong S, Neely M (2016) Staggered time average algorithm for stochastic non-smooth optimization with O(1/t) convergence. Preprint, submitted July 11, https://doi.org/10.48550/arXiv.1607.02842.Google Scholar
  • [67] Themelis A, Patrinos P (2019) Supermann: A superlinearly convergent algorithm for finding fixed points of nonexpansive operators. IEEE Trans. Automatic Control 64(12):4875–4890.CrossrefGoogle Scholar
  • [68] Tibshirani RJ (2013) The lasso problem and uniqueness. Electronic J. Statist. 7:1456–1490.CrossrefGoogle Scholar
  • [69] Tseng P (2010) Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Programming 125(2):263–295.CrossrefGoogle Scholar
  • [70] von Neumann J (1950) Functional Operators (AM-22), Volume 2: The Geometry of Orthogonal Spaces, Annals of Mathematics Studies, vol. 22 (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • [71] Waldspurger I (2018) Phase retrieval with random gaussian sensing vectors by alternating projections. IEEE Trans. Inform. Theory 64(5):3301–3312.CrossrefGoogle Scholar
  • [72] Whitney H (1992) Local properties of analytic varieties. Eells J, Toledo D, eds. Hassler Whitney Collected Papers, Contemporary Mathematicians (Birkhäuser, Boston), 497–536.CrossrefGoogle Scholar
  • [73] Whitney H (1992) Tangents to an analytic variety. Eells J, Toledo D, eds. Hassler Whitney Collected Papers, Contemporary Mathematicians (Birkhäuser, Boston), 537–590.CrossrefGoogle Scholar
  • [74] Wolfe P (1975) A method of conjugate subgradients for minimizing nondifferentiable functions. Balinski ML, Wolfe P, eds. Nondifferentiable Optimization, Mathematical Programming Studies, vol. 3 (Springer, Berlin), 145–173.CrossrefGoogle Scholar
  • [75] Yang T, Lin Q (2018) RSG: Beating subgradient method without smoothness and strong convexity. J. Machine Learn. Res. 19(6):1–33.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.