Stochastic Subgradient Descent Escapes Active Strict Saddles on Weakly Convex Functions

Published Online:https://doi.org/10.1287/moor.2021.0194

References

  • [1] Attouch H, Bolte J, Svaiter B (2011) Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 137:91–124.CrossrefGoogle Scholar
  • [2] Benaïm M (1999) Dynamics of stochastic approximation algorithms. Azéma J, Émery M, Ledoux M, Yor M, eds. Séminaire de Probabilités XXXIII, Lecture Notes in Mathematics, vol. 1709 (Springer, Berlin), 1–68.CrossrefGoogle Scholar
  • [3] Benaïm M, Hofbauer J, Sorin S (2005) Stochastic approximations and differential inclusions. SIAM J. Control Optim. 44(1):328–348.CrossrefGoogle Scholar
  • [4] Bianchi P, Hachem W, Schechtman S (2022) Convergence of constant step stochastic gradient descent for non-smooth non-convex functions. Set-Valued Var. Anal. 30:1117–1147.CrossrefGoogle Scholar
  • [5] Bierstone E, Milman P (1988) Semianalytic and subanalytic sets. Publications Math. l’IHÉS 67:5–42.CrossrefGoogle Scholar
  • [6] Bolte J, Pauwels E (2020) A mathematical model for automatic differentiation in machine learning. Adv. Neural Inform. Processing Systems 33:10809–10819.Google Scholar
  • [7] Bolte J, Pauwels E (2021) Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning. Math. Program. 188:19–51.CrossrefGoogle Scholar
  • [8] Bolte J, Daniilidis A, Lewis A (2009) Tame functions are semismooth. Math. Program. 117:5–19.CrossrefGoogle Scholar
  • [9] Bolte J, Le T, Pauwels E (2022) Subgradient sampling for nonsmooth nonconvex minimization. Preprint, submitted February 28, https://arxiv.org/abs/2202.13744.Google Scholar
  • [10] Bolte J, Daniilidis A, Lewis A, Shiota M (2007) Clarke subgradients of stratifiable functions. SIAM J. Optim. 18(2):556–572.CrossrefGoogle Scholar
  • [11] Boumal N (2020) An introduction to optimization on smooth manifolds. Accessed August 29, 2023, http://www.nicolasboumal.net/book.Google Scholar
  • [12] Brandière O, Duflo M (1996) Les algorithmes stochastiques contournent-ils les pièges? Ann. Inst. Henri Poincare Probab. Statist. 32(3):395–427.Google Scholar
  • [13] Burke JV, Moré JJ (1988) On the identification of active constraints. SIAM J. Numer. Anal. 25(5):1197–1211.CrossrefGoogle Scholar
  • [14] Clarke FH, Ledyaev YS, Stern RJ, Wolenski PR (1998) Nonsmooth Analysis and Control Theory, Graduate Texts in Mathematics, vol. 178 (Springer-Verlag, New York).Google Scholar
  • [15] Coste M (2000) An Introduction to o-Minimal Geometry (Istituti Editoriali e Poligrafici Internazionali, Pisa, Italy).Google Scholar
  • [16] Daniilidis A, Drusvyatskiy D (2020) Pathological subgradient dynamics. SIAM J. Optim. 30(2):1327–1338.CrossrefGoogle Scholar
  • [17] Davis D, Drusvyatskiy D (2022) Proximal methods avoid active strict saddles of weakly convex functions. Foundations Comput. Math. 22:561–606.CrossrefGoogle Scholar
  • [18] Davis D, Drusvyatskiy D, Charisopoulos V (2019) Stochastic algorithms with geometric step decay converge linearly on sharp functions. Preprint, submitted July 22, https://arxiv.org/abs/1907.09547.Google Scholar
  • [19] Davis D, Drusvyatskiy D, Jiang L (2021) Subgradient methods near active manifolds: Saddle point avoidance, local convergence, and asymptotic normality. Preprint, submitted August 26, https://arxiv.org/abs/2108.11832.Google Scholar
  • [20] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20:119–154.CrossrefGoogle Scholar
  • [21] Drusvyatskiy D, Lewis A (2013) Semi-algebraic functions have small subdifferentials. Math. Program. 140:5–29.CrossrefGoogle Scholar
  • [22] Drusvyatskiy D, Lewis A (2014) Optimality, identifiability, and sensitivity. Math. Program. 147:467–498.CrossrefGoogle Scholar
  • [23] Drusvyatskiy D, Ioffe A, Lewis A (2016) Generic minimizing behavior in semialgebraic optimization. SIAM J. Optim. 26(1):513–534.CrossrefGoogle Scholar
  • [24] Horn RA, Johnson CR (1994) Topics in Matrix Analysis, corrected reprint of the 1991 original (Cambridge University Press, Cambridge, UK).Google Scholar
  • [25] Ioffe AD (2009) An invitation to tame optimization. SIAM J. Optim. 19(4):1894–1917.CrossrefGoogle Scholar
  • [26] Kelley A (1967) The stable, center-stable, center, center-unstable, unstable manifolds. J. Differential Equations 3(4):546–570.CrossrefGoogle Scholar
  • [27] Kurdyka K (1998) On gradients of functions definable in o-minimal structures. Ann. Inst. Fourier Grenoble. 48(3):769–783.CrossrefGoogle Scholar
  • [28] Lafontaine J (2015) An Introduction to Differential Manifolds (Springer International Publishing, Cham, Switzerland).CrossrefGoogle Scholar
  • [29] Lai T, Wei C (1983) A note on Martingale difference sequences satisfying the local Marcinkiewicz-Zygmund condition. Bull. Inst. Math. Acad. Sinica 11:1–13.Google Scholar
  • [30] Lee JD, Simchowitz M, Jordan MI, Recht B (2016) Gradient descent only converges to minimizers. Proc. Annu. Conf. Learning Theory 49:1246–1257.Google Scholar
  • [31] Lewis A (2002) Active sets, nonsmoothness, and sensitivity. SIAM J. Optim. 13:702–725.CrossrefGoogle Scholar
  • [32] Lewis A, Malick J (2008) Alternating projections on manifolds. Math. Oper. Res. 33(1):216–234.LinkGoogle Scholar
  • [33] Loi T (1998) Verdier and strict Thom stratifications in o-minimal structures. Illinois J. Math. 42(2):347–356.CrossrefGoogle Scholar
  • [34] Majewski S, Miasojedow B, Moulines E (2018) Analysis of nonsmooth stochastic approximation: The differential inclusion approach. Preprint, submitted May 4, https://arxiv.org/abs/1805.01916.Google Scholar
  • [35] Pemantle R (1990) Nonconvergence to unstable points in urn models and stochastic approximations. Ann. Probab. 18(2):698–712.CrossrefGoogle Scholar
  • [36] Robbins H, Siegmund D (1971) A convergence theorem for non negative almost supermartingales and some applications. Rustagi JS, ed. Optimizing Methods in Statistics (Academic Press, New York), 233–257.Google Scholar
  • [37] Schechtman S (2021) Some problems in nonconvex stochastic optimization. Thesis, Université Gustave Eiffel, Champs-sur-Marne. Accessed August 29, 2023, https://tel.archives-ouvertes.fr/tel-03698454.Google Scholar
  • [38] Tarrès P (2000) Pièges répulsifs. Comptes Rendus Acad. Sci. Ser. 1 Math. 330(2):125–130.Google Scholar
  • [39] Van den Dries L, Miller C (1996) Geometric categories and o-minimal structures. Duke Math. J. 84(2):497–540.CrossrefGoogle Scholar
  • [40] Wilkie A (2007) O-minimal structures. Séminaire Bourbaki 985:131–142.Google Scholar
  • [41] Williams D (1991) Probability with Martingales (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [42] Wright SJ (1993) Identifiable surfaces in constrained optimization. SIAM J. Control Optim. 31(4):1063–1079.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.