A Superlinearly Convergent Subgradient Method for Sharp Semismooth Problems

Vasileios Charisopoulos
Corresponding Author
Vasileios Charisopoulos
[email protected]
https://orcid.org/0000-0002-3717-0236
School of Operations Research & Information Engineering, Cornell University, Ithaca, New York 14853
Search for more papers by this author
,
Damek Davis
Damek Davis
[email protected]
https://orcid.org/0000-0003-2105-4641
School of Operations Research & Information Engineering, Cornell University, Ithaca, New York 14853
Search for more papers by this author

Vasileios Charisopoulos

Corresponding Author

Vasileios Charisopoulos

[email protected]

https://orcid.org/0000-0002-3717-0236

School of Operations Research & Information Engineering, Cornell University, Ithaca, New York 14853

Search for more papers by this author

Damek Davis

[email protected]

https://orcid.org/0000-0003-2105-4641

School of Operations Research & Information Engineering, Cornell University, Ithaca, New York 14853

Search for more papers by this author

Published Online:16 Aug 2023https://doi.org/10.1287/moor.2023.1390

References

[1] Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, et al. (2016) Tensorflow: A system for large-scale machine learning. 12th USENIX Sympos. Operating Systems Design Implementation (OSDI 16) (USENIX Association, Savannah, GA), 265–283.Google Scholar
[2] Balan R, Casazza P, Edidin D (2006) On signal reconstruction without phase. Appl. Comput. Harmonic Anal. 20(3):345–356.Crossref, Google Scholar
[3] Banach S (1922) Sur les opérations dans les ensembles abstraits et leur application aux équations intégrales. Fundamenta Mathematicae 3(1):133–181.Crossref, Google Scholar
[4] Bauschke HH, Noll D, Phan HM (2015) Linear and strong convergence of algorithms involving averaged nonexpansive operators. J. Math. Anal. Appl. 421(1):1–20.Crossref, Google Scholar
[5] Bolte J, Pauwels E (2021) Conservative set valued fields, automatic differentiation, stochastic gradient methods and deep learning. Math. Programming 188(1):19–51.Crossref, Google Scholar
[6] Bolte J, Daniilidis A, Lewis A (2009) Tame functions are semismooth. Math. Programming 117(1):5–19.Google Scholar
[7] Boumal N (2023) An Introduction to Optimization on Smooth Manifolds (Cambridge University Press, Cambridge).Google Scholar
[8] Candes EJ, Tao T (2005) Decoding by linear programming. IEEE Trans. Inform. Theory 51(12):4203–4215.Crossref, Google Scholar
[9] Candes E, Tao T (2006) Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52(12):5406–5425.Google Scholar
[10] Charisopoulos V, Chen Y, Davis D, Díaz M, Ding L, Drusvyatskiy D (2021) Low-rank matrix recovery with composite optimization: Good conditioning and rapid convergence. Foundations Comput. Math. 21:1505–1593.Google Scholar
[11] Chen SS, Donoho DL, Saunders MA (2001) Atomic decomposition by basis pursuit. SIAM Rev. 43(1):129–159.Crossref, Google Scholar
[12] Conca A, Edidin D, Hering M, Vinzant C (2015) An algebraic characterization of injectivity in phase retrieval. Appl. Comput. Harmonic Anal. 38(2):346–356.Crossref, Google Scholar
[13] Davis D, Drusvyatskiy D (2021) Conservative and semismooth derivatives are equivalent for semialgebraic maps. Set-Valued Variational Anal. 30:453–463.Google Scholar
[14] Davis D, Drusvyatskiy D, Jiang L (2021) Subgradient methods near active manifolds: Saddle point avoidance, local convergence, and asymptotic normality. Preprint, submitted August 26, https://doi.org/10.48550/arXiv.2108.11832.Google Scholar
[15] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119–154.Crossref, Google Scholar
[16] Davis D, Drusvyatskiy D, MacPhee KJ, Paquette C (2018) Subgradient methods for sharp weakly convex functions. J. Optim. Theory Appl. 179(3):962–982.Google Scholar
[17] Díaz M, Grimmer B (2023) Optimal convergence rates for the proximal bundle method. 33(2). https://epubs.siam.org/doi/10.1137/21M1428601.Google Scholar
[18] Donoho DL (2006) Compressed sensing. IEEE Trans. Inform. Theory 52(4):1289–1306.Crossref, Google Scholar
[19] Drusvyatskiy D (2013) Slope and geometry in variational mathematics. PhD thesis, Cornell University, Ithaca, NY.Google Scholar
[20] Drusvyatskiy D, Ioffe AD, Lewis AS (2015) Curves of descent. SIAM J. Control Optim. 53(1):114–138.Crossref, Google Scholar
[21] Drusvyatskiy D, Ioffe AD, Lewis AS (2015) Transversality and alternating projections for nonconvex sets. Foundations Comput. Math. 15(6):1637–1651.Crossref, Google Scholar
[22] Du Y, Ruszczyński A (2017) Rate of convergence of the bundle method. J. Optim. Theory Appl. 173(3):908–922.Crossref, Google Scholar
[23] Duchi JC, Ruan F (2018) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.Google Scholar
[24] Emiel G, Sagastizábal C (2010) Incremental-like bundle methods with application to energy planning. Comput. Optim. Appl. 46(2):305–332.Crossref, Google Scholar
[25] Eremin I (1965) The relaxation method of solving systems of inequalities with convex functions on the left-hand side. Proc. USSR Acad. Sci. 160:994–996.Google Scholar
[26] Ermol’ev YM, Norkin V (1998) Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization. Cybernetics Systems Anal. 34(2):196–215.Crossref, Google Scholar
[27] Facchinei F, Pang JS (2007) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer Science & Business Media, New York).Google Scholar
[28] Facchinei F, Fischer A, Herrich M (2014) An LP-Newton method: Nonsmooth equations, KKT systems, and nonisolated solutions. Math. Programming 146(1):1–36.Crossref, Google Scholar
[29] Goffin J (1977) On convergence rates of subgradient optimization methods. Math. Programming 13(3):329–347.Google Scholar
[30] Golub GH, Van Loan CF (2013) Matrix Computations. 4th ed. (Johns Hopkins University Press, Baltimore).Google Scholar
[31] Guntuboyina A (2012) Optimal rates of convergence for convex set estimation from support functions. Ann. Statist. 40(1):385–411.Google Scholar
[32] Hare W, Sagastizábal C (2010) A redistributed proximal bundle method for nonconvex optimization. SIAM J. Optim. 20(5):2442–2473.Crossref, Google Scholar
[33] Hoffman AJ (1952) On approximate solutions of systems of linear inequalities. J. Res. National Bureau Standards 49(4):263–265.Crossref, Google Scholar
[34] Ioffe AD (2017) Variational Analysis of Regular Mappings: Theory and Applications, Springer Monographs in Mathematics (Springer, Cham, Switzerland).Crossref, Google Scholar
[35] Izmailov AF, Solodov MV (2014) Newton-Type Methods for Optimization and Variational Problems (Springer, Berlin).Crossref, Google Scholar
[36] Johnstone PR, Moulin P (2020) Faster subgradient methods for functions with Hölderian growth. Math. Programming 180(1):417–450.Google Scholar
[37] Kiwiel KC (1985) A linearization algorithm for nonsmooth minimization. Math. Oper. Res. 10(2):185–194.Link, Google Scholar
[38] Kiwiel KC (2000) Efficiency of proximal bundle methods. J. Optim. Theory Appl. 104(3):589–603.Crossref, Google Scholar
[39] Klatte D, Kummer B (2006) Nonsmooth Equations in Optimization: Regularity, Calculus, Methods and Applications, vol. 60 (Springer Science & Business Media, New York).Google Scholar
[40] Krasnosel’skii MA (1955) Two comments on the method of successive approximations. Uspekhi Matematicheskikh Nauk 10:123–127.Google Scholar
[41] Kummer B (1988) Newton’s method for non-differentiable functions. Math. Res. 45:114–125.Google Scholar
[42] Lee JM (2013) Smooth manifolds. Introduction to Smooth Manifolds (Springer, Berlin), 1–31.Crossref, Google Scholar
[43] Lemarechal C (1975) An extension of davidon methods to nondifferentiable problems. Balinski ML, Wolfe P, eds. Nondifferentiable Optimization, Mathematical Programming Studies, vol. 3 (Springer, Berlin), 95–109.Crossref, Google Scholar
[44] Lewis A, Tian T (2021) The structure of conservative gradient fields. 31(3). https://epubs.siam.org/doi/10.1137/21M1393637.Google Scholar
[45] Lewis AS, Malick J (2008) Alternating projections on manifolds. Math. Oper. Res. 33(1):216–234.Google Scholar
[46] Liang J, Monteiro RD (2021) A proximal bundle variant with optimal iteration-complexity for a large range of prox stepsizes. SIAM J. Optim. 31(4):2955–2986.Crossref, Google Scholar
[47] Mann WR (1953) Mean value methods in iteration. Proc. Amer. Math. Soc. 4(3):506–510.Crossref, Google Scholar
[48] Mifflin R (1977) Semismooth and semiconvex functions in constrained optimization. SIAM J. Optim. 15(6):959–972.Crossref, Google Scholar
[49] Mifflin R, Sagastizábal C (2005) A VU-algorithm for convex minimization. Math. Programming 104(2):583–608.Crossref, Google Scholar
[50] Norkin VI (1986) Stochastic generalized-differentiable functions in the problem of nonconvex nonsmooth stochastic optimization. Cybernetics 22(6):804–809.Crossref, Google Scholar
[51] Nurminskii EA (1973) The quasigradient method for the solving of the nonlinear programming problems. Cybernetics 9(1):145–150.Crossref, Google Scholar
[52] Nurminskii EA (1974) Minimization of nondifferentiable functions in the presence of noise. Cybernetics 10(4):619–621.Crossref, Google Scholar
[53] Oliveira Wd, Sagastizábal C (2014) Bundle methods in the XXIst century: A bird’s-eye view. Pesquisa Operacional 34(3):647–670.Crossref, Google Scholar
[54] Pang CHJ (2015) Nonconvex set intersection problems: From projection methods to the Newton method for super-regular sets. Preprint, submitted June 27, https://doi.org/10.48550/arXiv.1506.08246.Google Scholar
[55] Pang CJ (2015) Set intersection problems: Supporting hyperplanes and quadratic programming. Math. Programming 149(1):329–359.Crossref, Google Scholar
[56] Pang JS (1997) Error bounds in mathematical programming. Math. Programming 79(1–3):299–332.Crossref, Google Scholar
[57] Polyak BT (1969) Minimization of unsmooth functionals. USSR Comput. Math. Math. Physics 9(3):14–29.Crossref, Google Scholar
[58] Polyak BT (1978) Subgradient methods: A survey of Soviet research. Nonsmoothe Optim. Proc. IIASA Workshop, vol. 3 (Pergamon, Oxford, UK), 5–29.Google Scholar
[59] Prince JL, Willsky AS (1990) Reconstructing convex sets from support line measurements. IEEE Trans. Pattern Anal. Machine Intelligence 12(4):377–389.Crossref, Google Scholar
[60] Qi L (1993) Convergence analysis of some algorithms for solving nonsmooth equations. Math. Oper. Res. 18(1):227–244.Link, Google Scholar
[61] Qi L, Sun J (1993) A nonsmooth version of Newton’s method. Math. Programming 58(1):353–367.Crossref, Google Scholar
[62] Rockafellar R, Wets RB (1998) Variational Analysis, Grundlehren der Mathematischen Wissenschaften, vol. 317 (Springer, Berlin).Crossref, Google Scholar
[63] Sagastizábal C (2012) Divide to conquer: Decomposition methods for energy optimization. Math. Programming 134(1):187–222.Crossref, Google Scholar
[64] Schreiber R, Van Loan C (1989) A storage-efficient WY representation for products of householder transformations. SIAM J. Sci. Statist. Comput. 10(1):53–57.Google Scholar
[65] Shor N (1970) The rate of convergence of the method of the generalized gradient descent with expansion of space. Kibernetika (Kiev) 2:80–85.Google Scholar
[66] Supittayapornpong S, Neely M (2016) Staggered time average algorithm for stochastic non-smooth optimization with O(1/t) convergence. Preprint, submitted July 11, https://doi.org/10.48550/arXiv.1607.02842.Google Scholar
[67] Themelis A, Patrinos P (2019) Supermann: A superlinearly convergent algorithm for finding fixed points of nonexpansive operators. IEEE Trans. Automatic Control 64(12):4875–4890.Crossref, Google Scholar
[68] Tibshirani RJ (2013) The lasso problem and uniqueness. Electronic J. Statist. 7:1456–1490.Crossref, Google Scholar
[69] Tseng P (2010) Approximation accuracy, gradient methods, and error bound for structured convex optimization. Math. Programming 125(2):263–295.Crossref, Google Scholar
[70] von Neumann J (1950) Functional Operators (AM-22), Volume 2: The Geometry of Orthogonal Spaces, Annals of Mathematics Studies, vol. 22 (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
[71] Waldspurger I (2018) Phase retrieval with random gaussian sensing vectors by alternating projections. IEEE Trans. Inform. Theory 64(5):3301–3312.Crossref, Google Scholar
[72] Whitney H (1992) Local properties of analytic varieties. Eells J, Toledo D, eds. Hassler Whitney Collected Papers, Contemporary Mathematicians (Birkhäuser, Boston), 497–536.Crossref, Google Scholar
[73] Whitney H (1992) Tangents to an analytic variety. Eells J, Toledo D, eds. Hassler Whitney Collected Papers, Contemporary Mathematicians (Birkhäuser, Boston), 537–590.Crossref, Google Scholar
[74] Wolfe P (1975) A method of conjugate subgradients for minimizing nondifferentiable functions. Balinski ML, Wolfe P, eds. Nondifferentiable Optimization, Mathematical Programming Studies, vol. 3 (Springer, Berlin), 145–173.Crossref, Google Scholar
[75] Yang T, Lin Q (2018) RSG: Beating subgradient method without smoothness and strong convexity. J. Machine Learn. Res. 19(6):1–33.Google Scholar

cover image Mathematics of Operations Research

Volume 49, Issue 3

August 2024

Pages 1303-2047, C2

Article Information

Metrics

Information

Received:January 24, 2022
Accepted:November 13, 2022
Published Online:August 16, 2023

Cite as

Vasileios Charisopoulos, Damek Davis (2023) A Superlinearly Convergent Subgradient Method for Sharp Semismooth Problems. Mathematics of Operations Research 49(3):1678-1709.

https://doi.org/10.1287/moor.2023.1390

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Superlinearly Convergent Subgradient Method for Sharp Semismooth Problems

References

Volume 49, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News