Stochastic Subgradient Descent Escapes Active Strict Saddles on Weakly Convex Functions

Pascal Bianchi
Pascal Bianchi
[email protected]
Laboratoire Traitement et Communication de l’Information, Télécom Paris, Institut Polytechnique du Paris, 91120 Palaiseau, France;
Search for more papers by this author
,
Walid Hachem
Walid Hachem
[email protected]
https://orcid.org/0000-0001-8499-2761
Laboratoire d’Informatique Gaspard-Monge, Centre National de la Recherche Scientifique, Université Gustave Eiffel, École supérieure d’ingénieurs en électrotechnique et électronique Paris, 77420 Champs-sur-Marne, France
Search for more papers by this author
,
Sholom Schechtman
Corresponding Author
Sholom Schechtman
[email protected]
https://orcid.org/0000-0002-5390-4279
Laboratoire d’Informatique Gaspard-Monge, Centre National de la Recherche Scientifique, Université Gustave Eiffel, École supérieure d’ingénieurs en électrotechnique et électronique Paris, 77420 Champs-sur-Marne, France
Search for more papers by this author

Laboratoire Traitement et Communication de l’Information, Télécom Paris, Institut Polytechnique du Paris, 91120 Palaiseau, France;

Search for more papers by this author

Walid Hachem

[email protected]

https://orcid.org/0000-0001-8499-2761

Laboratoire d’Informatique Gaspard-Monge, Centre National de la Recherche Scientifique, Université Gustave Eiffel, École supérieure d’ingénieurs en électrotechnique et électronique Paris, 77420 Champs-sur-Marne, France

Search for more papers by this author

Sholom Schechtman

Corresponding Author

Sholom Schechtman

[email protected]

https://orcid.org/0000-0002-5390-4279

Search for more papers by this author

Published Online:7 Sep 2023https://doi.org/10.1287/moor.2021.0194

References

[1] Attouch H, Bolte J, Svaiter B (2011) Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137:91–124.Crossref, Google Scholar
[2] Benaïm M (1999) Dynamics of stochastic approximation algorithms. Azéma J, Émery M, Ledoux M, Yor M, eds. Séminaire de Probabilités XXXIII, Lecture Notes in Mathematics, vol. 1709 (Springer, Berlin), 1–68.Crossref, Google Scholar
[3] Benaïm M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1):328–348.Crossref, Google Scholar
[4] Bianchi P, Hachem W, Schechtman S (2022) Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. Set-Valued Var. Anal. 30:1117–1147.Crossref, Google Scholar
[5] Bierstone E, Milman P (1988) Semianalytic and subanalytic sets. Publications Math. l’IHÉS 67:5–42.Crossref, Google Scholar
[6] Bolte J, Pauwels E (2020) A mathematical model for automatic differentiation in machine learning. Adv. Neural Inform. Processing Systems 33:10809–10819.Google Scholar
[7] Bolte J, Pauwels E (2021) Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning. Math. Program. 188:19–51.Crossref, Google Scholar
[8] Bolte J, Daniilidis A, Lewis A (2009) Tame functions are semismooth. Math. Program. 117:5–19.Crossref, Google Scholar
[9] Bolte J, Le T, Pauwels E (2022) Subgradient sampling for nonsmooth nonconvex minimization. Preprint, submitted February 28, https://arxiv.org/abs/2202.13744.Google Scholar
[10] Bolte J, Daniilidis A, Lewis A, Shiota M (2007) Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2):556–572.Crossref, Google Scholar
[11] Boumal N (2020) An introduction to optimization on smooth manifolds. Accessed August 29, 2023, http://www.nicolasboumal.net/book.Google Scholar
[12] Brandière O, Duflo M (1996) Les algorithmes stochastiques contournent-ils les pièges? Ann. Inst. Henri Poincare Probab. Statist. 32(3):395–427.Google Scholar
[13] Burke JV, Moré JJ (1988) On the identification of active constraints. SIAM J. Numer. Anal. 25(5):1197–1211.Crossref, Google Scholar
[14] Clarke FH, Ledyaev YS, Stern RJ, Wolenski PR (1998) Nonsmooth Analysis and Control Theory, Graduate Texts in Mathematics, vol. 178 (Springer-Verlag, New York).Google Scholar
[15] Coste M (2000) An Introduction to o-Minimal Geometry (Istituti Editoriali e Poligrafici Internazionali, Pisa, Italy).Google Scholar
[16] Daniilidis A, Drusvyatskiy D (2020) Pathological subgradient dynamics. SIAM J. Optim. 30(2):1327–1338.Crossref, Google Scholar
[17] Davis D, Drusvyatskiy D (2022) Proximal methods avoid active strict saddles of weakly convex functions. Foundations Comput. Math. 22:561–606.Crossref, Google Scholar
[18] Davis D, Drusvyatskiy D, Charisopoulos V (2019) Stochastic algorithms with geometric step decay converge linearly on sharp functions. Preprint, submitted July 22, https://arxiv.org/abs/1907.09547.Google Scholar
[19] Davis D, Drusvyatskiy D, Jiang L (2021) Subgradient methods near active manifolds: Saddle point avoidance, local convergence, and asymptotic normality. Preprint, submitted August 26, https://arxiv.org/abs/2108.11832.Google Scholar
[20] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20:119–154.Crossref, Google Scholar
[21] Drusvyatskiy D, Lewis A (2013) Semi-algebraic functions have small subdifferentials. Math. Program. 140:5–29.Crossref, Google Scholar
[22] Drusvyatskiy D, Lewis A (2014) Optimality, identifiability, and sensitivity. Math. Program. 147:467–498.Crossref, Google Scholar
[23] Drusvyatskiy D, Ioffe A, Lewis A (2016) Generic minimizing behavior in semialgebraic optimization. SIAM J. Optim. 26(1):513–534.Crossref, Google Scholar
[24] Horn RA, Johnson CR (1994) Topics in Matrix Analysis, corrected reprint of the 1991 original (Cambridge University Press, Cambridge, UK).Google Scholar
[25] Ioffe AD (2009) An invitation to tame optimization. SIAM J. Optim. 19(4):1894–1917.Crossref, Google Scholar
[26] Kelley A (1967) The stable, center-stable, center, center-unstable, unstable manifolds. J. Differential Equations 3(4):546–570.Crossref, Google Scholar
[27] Kurdyka K (1998) On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier Grenoble. 48(3):769–783.Crossref, Google Scholar
[28] Lafontaine J (2015) An Introduction to Differential Manifolds (Springer International Publishing, Cham, Switzerland).Crossref, Google Scholar
[29] Lai T, Wei C (1983) A note on Martingale difference sequences satisfying the local Marcinkiewicz-Zygmund condition. Bull. Inst. Math. Acad. Sinica 11:1–13.Google Scholar
[30] Lee JD, Simchowitz M, Jordan MI, Recht B (2016) Gradient descent only converges to minimizers. Proc. Annu. Conf. Learning Theory 49:1246–1257.Google Scholar
[31] Lewis A (2002) Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13:702–725.Crossref, Google Scholar
[32] Lewis A, Malick J (2008) Alternating projections on manifolds. Math. Oper. Res. 33(1):216–234.Link, Google Scholar
[33] Loi T (1998) Verdier and strict Thom stratifications in o-minimal structures. Illinois J. Math. 42(2):347–356.Crossref, Google Scholar
[34] Majewski S, Miasojedow B, Moulines E (2018) Analysis of nonsmooth stochastic approximation: The differential inclusion approach. Preprint, submitted May 4, https://arxiv.org/abs/1805.01916.Google Scholar
[35] Pemantle R (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18(2):698–712.Crossref, Google Scholar
[36] Robbins H, Siegmund D (1971) A convergence theorem for non negative almost supermartingales and some applications. Rustagi JS, ed. Optimizing Methods in Statistics (Academic Press, New York), 233–257.Google Scholar
[37] Schechtman S (2021) Some problems in nonconvex stochastic optimization. Thesis, Université Gustave Eiffel, Champs-sur-Marne. Accessed August 29, 2023, https://tel.archives-ouvertes.fr/tel-03698454.Google Scholar
[38] Tarrès P (2000) Pièges répulsifs. Comptes Rendus Acad. Sci. Ser. 1 Math. 330(2):125–130.Google Scholar
[39] Van den Dries L, Miller C (1996) Geometric categories and o-minimal structures. Duke Math. J. 84(2):497–540.Crossref, Google Scholar
[40] Wilkie A (2007) O-minimal structures. Séminaire Bourbaki 985:131–142.Google Scholar
[41] Williams D (1991) Probability with Martingales (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[42] Wright SJ (1993) Identifiable surfaces in constrained optimization. SIAM J. Control Optim. 31(4):1063–1079.Crossref, Google Scholar

cover image Mathematics of Operations Research

Volume 49, Issue 3

August 2024

Pages 1303-2047, C2

Article Information

Metrics

Information

Received:August 04, 2021
Accepted:July 23, 2023
Published Online:September 07, 2023

Cite as

Pascal Bianchi, Walid Hachem, Sholom Schechtman (2023) Stochastic Subgradient Descent Escapes Active Strict Saddles on Weakly Convex Functions. Mathematics of Operations Research 49(3):1761-1790.

https://doi.org/10.1287/moor.2021.0194

Keywords

Acknowledgments

The authors thank the anonymous reviewers for their outstanding job of refereeing, and in particular of pointing out two serious flaws in the first version of this work.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Stochastic Subgradient Descent Escapes Active Strict Saddles on Weakly Convex Functions

References

Volume 49, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News