Graphical Convergence of Subgradients in Nonconvex Optimization and Learning

Published Online:https://doi.org/10.1287/moor.2021.1126

References

  • [1] Albano P, Cannarsa P (1999) Singularities of semiconcave functions in Banach spaces.McEneaney WM, Yin GG, Zhang Q, eds. Stochastic analysis, control, optimization and application. Systems & Control: Foundations & Applications (Birkhäuser, Boston), 171–190.CrossrefGoogle Scholar
  • [2] Attouch H (1977) Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés. C. R. Acad. Sci. Paris Sér. A-B. 284(10):A539–A542.Google Scholar
  • [3] Attouch H (1984) Variational Convergence for Functions and Operators. Applicable Mathematics Series (Pitman Advanced Publishing Program, Boston, MA).Google Scholar
  • [4] Attouch H, Wets RJB (1990) Epigraphical processes: laws of large numbers for random LSC functions. Sém. Anal. Convexe 20. Exp. 13:29.Google Scholar
  • [5] Bartlett PL, Bousquet O, Mendelson S (2005) Local Rademacher complexities. Ann. Statist. 33(4):1497–1537.CrossrefGoogle Scholar
  • [6] Bartlett PL, Mendelson S (2002) Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3(Nov):463–482.Google Scholar
  • [7] Boucheron S, Lugosi G, Massart P (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence (Oxford University Press, Oxford, UK).CrossrefGoogle Scholar
  • [8] Bousquet O, Elisseeff A (2002) Stability and generalization. J. Mach. Learn. Res. 2(3):499–526.Google Scholar
  • [9] Clarke F (1983) Optimization and Nonsmooth Analysis (Wiley Interscience, New York).Google Scholar
  • [10] Davis D, Drusvyatskiy D (2018) Uniform graphical convergence of subgradients in nonconvex optimization and learning. Preprint, submitted December 17, https://arxiv.org/pdf/1810.07590.pdf.Google Scholar
  • [11] Davis D, Drusvyatskiy D (2019) Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1):207–239.CrossrefGoogle Scholar
  • [12] Davis D, Drusvyatskiy D, Kakade S, Lee J (2018) Stochastic subgradient method converges on tame functions. Preprint, submitted April 20, https://arxiv.org/abs/1804.07795.Google Scholar
  • [13] Davis D, Drusvyatskiy D, MacPhee K (2018) Stochastic model-based minimization under high-order growth. Preprint, submitted July 1, https://arxiv.org/abs/1807.00255.Google Scholar
  • [14] Davis D, Drusvyatskiy D, MacPhee KJ, Paquette C (2018) Subgradient methods for sharp weakly convex functions. J. Optim. Theory Appl. 179(3):962–982.Google Scholar
  • [15] Davis D, Drusvyatskiy D, Paquette C (2017) The nonsmooth landscape of phase retrieval. Preprint, submitted November 9, https://arxiv.org/abs/1711.03247.Google Scholar
  • [16] Davis D, Grimmer B (2019) Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim. 29(3):1908–1930.CrossrefGoogle Scholar
  • [17] Drusvyatskiy D (2018) The proximal point method revisited. SIAG/OPT Views News. 26(1):1–8.Google Scholar
  • [18] Drusvyatskiy D, Paquette C (2019) Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178(1-2):503–558.CrossrefGoogle Scholar
  • [19] Duchi JC, Ruan F (2018) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.CrossrefGoogle Scholar
  • [20] Duchi JC, Ruan F (2018) Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4):3229–3259.CrossrefGoogle Scholar
  • [21] Foster D, Sekhari A, Sridharan K (2018) Uniform convergence of gradients for non-convex learning and optimization. Preprint, submitted October 25, https://arxiv.org/abs/1810.11059.Google Scholar
  • [22] Geyer CJ (1994) On the asymptotics of constrained M-estimation. Ann. Statist. 22(4):1993–2010.Google Scholar
  • [23] Ghadimi S, Lan G, Zhang H (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming 155(1-2, Ser. A):267–305.Google Scholar
  • [24] Grünwald PD, Mehta NA (2016) Fast rates for general unbounded loss functions: from ERM to generalized Bayes. Preprint, submitted May 1, https://arxiv.org/abs/1605.00252.Google Scholar
  • [25] Hardt M, Recht B, Singer Y (2015) Train faster, generalize better: Stability of stochastic gradient descent. Preprint, submitted September 3, https://arxiv.org/abs/1509.01240.Google Scholar
  • [26] Kakade SM, Sridharan K, Tewari A (2009) On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. Adv. Neural Inform. Process. Syst. 21:793–800.Google Scholar
  • [27] Kaniovski YM, King AJ, Wets RJB (1995) Probabilistic bounds (via large deviations) for the solutions of stochastic programming problems. Ann. Oper. Res. 56(1):189–208.Google Scholar
  • [28] King AJ, Rockafellar RT (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148–162.Google Scholar
  • [29] Koller D, Schuurmans D, Bengio Y, Bottou L, eds. (2009) Fast Rates for Regularized Objectives, vol. 21, Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 1545–1552.Google Scholar
  • [30] Kontorovich A (2014) Concentration in unbounded metric spaces and algorithmic stability. Internat. Conf. Mach. Learn. 32(2):28–36.Google Scholar
  • [31] Li X, Zhihui Z, So AC, Vidal R (2018) Nonconvex robust low-rank matrix recovery. Preprint, submitted September 24, https://arxiv.org/abs/1809.09237.Google Scholar
  • [32] Liu M, Zhang X, Zhang L, Jin R, Yang T (2018) Fast rates of erm and stochastic approximation: Adaptive to error bound conditions. Preprint, submitted May 11, https://arxiv.org/abs/1805.04577.Google Scholar
  • [33] Maurer A (2016) A vector-contraction inequality for Rademacher complexities. Ortner R, Simon HU, Zilles S, eds. Algorithmic Learning Theory (Springer International Publishing, Cham, Switzerland), 3–17.CrossrefGoogle Scholar
  • [34] McDiarmid C (1989) On the method of bounded differences. Surveys in Combinatorics, 1989 (Norwich, 1989), London Mathematical Society Lecture Note Series, vol. 141 (Cambridge University Press, Cambridge), 148–188.Google Scholar
  • [35] Mehta NA (2016) Fast rates with high probability in exp-concave statistical learning. Preprint, submitted May 4, https://arxiv.org/abs/1605.01288.Google Scholar
  • [36] Mehta NA, Williamson RC (2014) From stochastic mixability to fast rates. Proc. 27th Internat. Conf. Neural Inform. Processing Sys. (MIT Press, Cambridge, MA).Google Scholar
  • [37] Mei S, Bai Y, Montanari A (2018) The landscape of empirical risk for nonconvex losses. Ann. Statist. 46(6A):2747–2774.Google Scholar
  • [38] Mordukhovich B (2006) Variational Analysis and Generalized Differentiation I: Basic Theory, vol. 330 (Grundlehren der mathematischen Wissenschaften, Springer, Berlin).CrossrefGoogle Scholar
  • [39] Mordukhovich BS (2018) Variational Analysis and Applications. Springer Monographs in Mathematics (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • [40] Moreau JJ (1965) Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93:273–299.Google Scholar
  • [41] Nemirovski A, Juditsky A, Lan G, Shapiro A (2008) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Google Scholar
  • [42] Nurminskii EA (1973) The quasigradient method for the solving of the nonlinear programming problems. Cybernetics 9(1):145–150.Google Scholar
  • [43] Poliquin R, Rockafellar R (1996) Prox-regular functions in variational analysis. Trans. Amer. Math. Soc. 348:1805–1838.CrossrefGoogle Scholar
  • [44] Rachev ST, Römisch W (2002) Quantitative stability in stochastic programming: The method of probability metrics. Math. Oper. Res. 27(4):792–818.Google Scholar
  • [45] Rakhlin A, Mukherjee S, Poggio T (2005) Stability results in learning theory. Anal. Appl. 3(4):397–417.Google Scholar
  • [46] Ralph D, Xu H (2011) Convergence of stationary points of sample average two-stage stochastic programs: a generalized equation approach. Math. Oper. Res. 36(3):568–592.Google Scholar
  • [47] Robinson SM (1996) Analysis of sample-path optimization. Math. Oper. Res. 21(3):513–528.Google Scholar
  • [48] Rockafellar RT (1970) Convex Analysis. Princeton Mathematical Series, No. 28 (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • [49] Rockafellar R (1982) Favorable classes of Lipschitz-continuous functions in subgradient optimization. Progress in Nondifferentiable Optimization, vol. 8, IIASA Collaborative Proc. Ser. CP-82 (International Institute for Applied Systems Analysis, Laxenburg, Austria), 125–143.Google Scholar
  • [50] Rockafellar R, Wets RB (1998) Variational Analysis, vol. 317 (Grundlehren der mathematischen Wissenschaften, Springer, Berlin).CrossrefGoogle Scholar
  • [51] Rolewicz S (1979) On paraconvex multifunctions. Third Symposium on Operations Research (Univ. Mannheim, Mannheim, 1978), section I, vol. 31, Oper. Res. Verfahren (Hain, Königstein/Ts.), 539–546.Google Scholar
  • [52] Römisch W, Wets R (2007) Stability of ɛ-approximate solutions to convex stochastic programs. SIAM J. Optim. 18(3):961–979.Google Scholar
  • [53] Shalev-Shwartz S, Ben-David S (2014) Understanding Machine Learning: From Theory to Algorithms, 1st ed. (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [54] Shalev-Shwartz S, Shamir O, Srebro N, Sridharan K (2009) Stochastic convex optimization. Proc. Conf. Learn. Theory (COLT).Google Scholar
  • [55] Shapiro A (2000) On the asymptotics of constrained local M-estimators. Ann. Statist. 28(3):948–960.Google Scholar
  • [56] Shapiro A (2000) Stochastic Programming by Monte Carlo Simulation Methods (Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, Institut für Mathematik, Berlin).Google Scholar
  • [57] Shapiro A, Homem-de Mello T (2000) On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs. SIAM J. Optim. 11(1):70–86.Google Scholar
  • [58] Shapiro A, Xu H (2007) Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions. J. Math. Anal. Appl. 325(2):1390–1399.Google Scholar
  • [59] Shapiro A, Dentcheva D, Ruszczyński A (2014) Lectures on stochastic programming, vol. 9, MOS-SIAM Series on Optimization (Society for Industrial and Applied Mathematics (SIAM), Mathematical Optimization Society, Philadelphia, PA).Google Scholar
  • [60] Toulis P, Airoldi E (2017) Asymptotic and finite-sample properties of estimators based on stochastic gradients. Ann. Statist. 45(4):1694–1727.Google Scholar
  • [61] van Erven T, Grünwald PD, Mehta NA, Reid MD, Williamson RC (2015) Fast rates in statistical and online learning. J. Mach. Learn. Res. 16:1793–1861.Google Scholar
  • [62] Vershynin R (2010) Introduction to the non-asymptotic analysis of random matrices. Preprint, submitted November 12, https://arxiv.org/abs/1011.3027.Google Scholar
  • [63] Vershynin R (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [64] Xu H (2010) Uniform exponential convergence of sample average random functions under general sampling with applications in stochastic programming. J. Math. Anal. Appl. 368(2):692–710.Google Scholar
  • [65] Xu H, Zhang D (2009) Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Programming 119(2):371–401.Google Scholar
  • [66] Zemel RS, Culotta A (2010) Smoothness, low noise and fast rates. Lafferty JD,Williams CKI, Shawe-Taylor J, eds. vol. 23, Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 2199–2207.Google Scholar
  • [67] Zhang S, He N (2018) On the convergence rate of stochastic mirror descent for nonsmooth nonconvex optimization. Preprint, submitted June 12, https://arxiv.org/abs/1806.04781.Google Scholar
  • [68] Zinkevich M (2003) Online convex programming and generalized infinitesimal gradient ascent. Proc. 20th Internat. Conf. Mach. Learn. (AAAI Press), 928–935.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.