Graphical Convergence of Subgradients in Nonconvex Optimization and Learning
Published Online:20 Apr 2021https://doi.org/10.1287/moor.2021.1126
References
- [1] (1999) Singularities of semiconcave functions in Banach spaces.McEneaney WM, Yin GG, Zhang Q, eds. Stochastic analysis, control, optimization and application. Systems & Control: Foundations & Applications (Birkhäuser, Boston), 171–190.Crossref, Google Scholar
- [2] (1977) Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés. C. R. Acad. Sci. Paris Sér. A-B. 284(10):A539–A542.Google Scholar
- [3] (1984) Variational Convergence for Functions and Operators. Applicable Mathematics Series (Pitman Advanced Publishing Program, Boston, MA).Google Scholar
- [4] (1990) Epigraphical processes: laws of large numbers for random LSC functions. Sém. Anal. Convexe 20. Exp. 13:29.Google Scholar
- [5] (2005) Local Rademacher complexities. Ann. Statist. 33(4):1497–1537.Crossref, Google Scholar
- [6] (2002) Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3(Nov):463–482.Google Scholar
- [7] (2013) Concentration Inequalities: A Nonasymptotic Theory of Independence (Oxford University Press, Oxford, UK).Crossref, Google Scholar
- [8] (2002) Stability and generalization. J. Mach. Learn. Res. 2(3):499–526.Google Scholar
- [9] (1983) Optimization and Nonsmooth Analysis (Wiley Interscience, New York).Google Scholar
- [10] (2018) Uniform graphical convergence of subgradients in nonconvex optimization and learning. Preprint, submitted December 17, https://arxiv.org/pdf/1810.07590.pdf.Google Scholar
- [11] (2019) Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1):207–239.Crossref, Google Scholar
- [12] (2018) Stochastic subgradient method converges on tame functions. Preprint, submitted April 20, https://arxiv.org/abs/1804.07795.Google Scholar
- [13] (2018) Stochastic model-based minimization under high-order growth. Preprint, submitted July 1, https://arxiv.org/abs/1807.00255.Google Scholar
- [14] (2018) Subgradient methods for sharp weakly convex functions. J. Optim. Theory Appl. 179(3):962–982.Google Scholar
- [15] (2017) The nonsmooth landscape of phase retrieval. Preprint, submitted November 9, https://arxiv.org/abs/1711.03247.Google Scholar
- [16] (2019) Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM J. Optim. 29(3):1908–1930.Crossref, Google Scholar
- [17] (2018) The proximal point method revisited. SIAG/OPT Views News. 26(1):1–8.Google Scholar
- [18] (2019) Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. 178(1-2):503–558.Crossref, Google Scholar
- [19] (2018) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.Crossref, Google Scholar
- [20] (2018) Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4):3229–3259.Crossref, Google Scholar
- [21] (2018) Uniform convergence of gradients for non-convex learning and optimization. Preprint, submitted October 25, https://arxiv.org/abs/1810.11059.Google Scholar
- [22] (1994) On the asymptotics of constrained M-estimation. Ann. Statist. 22(4):1993–2010.Google Scholar
- [23] (2016) Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization. Math. Programming 155(1-2, Ser. A):267–305.Google Scholar
- [24] (2016) Fast rates for general unbounded loss functions: from ERM to generalized Bayes. Preprint, submitted May 1, https://arxiv.org/abs/1605.00252.Google Scholar
- [25] (2015) Train faster, generalize better: Stability of stochastic gradient descent. Preprint, submitted September 3, https://arxiv.org/abs/1509.01240.Google Scholar
- [26] (2009) On the complexity of linear prediction: Risk bounds, margin bounds, and regularization. Adv. Neural Inform. Process. Syst. 21:793–800.Google Scholar
- [27] (1995) Probabilistic bounds (via large deviations) for the solutions of stochastic programming problems. Ann. Oper. Res. 56(1):189–208.Google Scholar
- [28] (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148–162.Google Scholar
- [29] Koller D, Schuurmans D, Bengio Y, Bottou L, eds. (2009) Fast Rates for Regularized Objectives, vol. 21, Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 1545–1552.Google Scholar
- [30] (2014) Concentration in unbounded metric spaces and algorithmic stability. Internat. Conf. Mach. Learn. 32(2):28–36.Google Scholar
- [31] (2018) Nonconvex robust low-rank matrix recovery. Preprint, submitted September 24, https://arxiv.org/abs/1809.09237.Google Scholar
- [32] (2018) Fast rates of erm and stochastic approximation: Adaptive to error bound conditions. Preprint, submitted May 11, https://arxiv.org/abs/1805.04577.Google Scholar
- [33] (2016) A vector-contraction inequality for Rademacher complexities. Ortner R, Simon HU, Zilles S, eds. Algorithmic Learning Theory (Springer International Publishing, Cham, Switzerland), 3–17.Crossref, Google Scholar
- [34] (1989) On the method of bounded differences. Surveys in Combinatorics, 1989 (Norwich, 1989), London Mathematical Society Lecture Note Series, vol. 141 (Cambridge University Press, Cambridge), 148–188.Google Scholar
- [35] (2016) Fast rates with high probability in exp-concave statistical learning. Preprint, submitted May 4, https://arxiv.org/abs/1605.01288.Google Scholar
- [36] (2014) From stochastic mixability to fast rates. Proc. 27th Internat. Conf. Neural Inform. Processing Sys. (MIT Press, Cambridge, MA).Google Scholar
- [37] (2018) The landscape of empirical risk for nonconvex losses. Ann. Statist. 46(6A):2747–2774.Google Scholar
- [38] (2006) Variational Analysis and Generalized Differentiation I: Basic Theory, vol. 330 (Grundlehren der mathematischen Wissenschaften, Springer, Berlin).Crossref, Google Scholar
- [39] (2018) Variational Analysis and Applications. Springer Monographs in Mathematics (Springer, Cham, Switzerland).Crossref, Google Scholar
- [40] (1965) Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93:273–299.Google Scholar
- [41] (2008) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Google Scholar
- [42] (1973) The quasigradient method for the solving of the nonlinear programming problems. Cybernetics 9(1):145–150.Google Scholar
- [43] (1996) Prox-regular functions in variational analysis. Trans. Amer. Math. Soc. 348:1805–1838.Crossref, Google Scholar
- [44] (2002) Quantitative stability in stochastic programming: The method of probability metrics. Math. Oper. Res. 27(4):792–818.Google Scholar
- [45] (2005) Stability results in learning theory. Anal. Appl. 3(4):397–417.Google Scholar
- [46] (2011) Convergence of stationary points of sample average two-stage stochastic programs: a generalized equation approach. Math. Oper. Res. 36(3):568–592.Google Scholar
- [47] (1996) Analysis of sample-path optimization. Math. Oper. Res. 21(3):513–528.Google Scholar
- [48] (1970) Convex Analysis. Princeton Mathematical Series, No. 28 (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
- [49] (1982) Favorable classes of Lipschitz-continuous functions in subgradient optimization. Progress in Nondifferentiable Optimization, vol. 8, IIASA Collaborative Proc. Ser. CP-82 (International Institute for Applied Systems Analysis, Laxenburg, Austria), 125–143.Google Scholar
- [50] (1998) Variational Analysis, vol. 317 (Grundlehren der mathematischen Wissenschaften, Springer, Berlin).Crossref, Google Scholar
- [51] (1979) On paraconvex multifunctions. Third Symposium on Operations Research (Univ. Mannheim, Mannheim, 1978), section I, vol. 31, Oper. Res. Verfahren (Hain, Königstein/Ts.), 539–546.Google Scholar
- [52] (2007) Stability of ɛ-approximate solutions to convex stochastic programs. SIAM J. Optim. 18(3):961–979.Google Scholar
- [53] (2014) Understanding Machine Learning: From Theory to Algorithms, 1st ed. (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [54] (2009) Stochastic convex optimization. Proc. Conf. Learn. Theory (COLT).Google Scholar
- [55] (2000) On the asymptotics of constrained local M-estimators. Ann. Statist. 28(3):948–960.Google Scholar
- [56] (2000) Stochastic Programming by Monte Carlo Simulation Methods (Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, Institut für Mathematik, Berlin).Google Scholar
- [57] (2000) On the rate of convergence of optimal solutions of Monte Carlo approximations of stochastic programs. SIAM J. Optim. 11(1):70–86.Google Scholar
- [58] (2007) Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions. J. Math. Anal. Appl. 325(2):1390–1399.Google Scholar
- [59] (2014) Lectures on stochastic programming, vol. 9, MOS-SIAM Series on Optimization (Society for Industrial and Applied Mathematics (SIAM), Mathematical Optimization Society, Philadelphia, PA).Google Scholar
- [60] (2017) Asymptotic and finite-sample properties of estimators based on stochastic gradients. Ann. Statist. 45(4):1694–1727.Google Scholar
- [61] (2015) Fast rates in statistical and online learning. J. Mach. Learn. Res. 16:1793–1861.Google Scholar
- [62] (2010) Introduction to the non-asymptotic analysis of random matrices. Preprint, submitted November 12, https://arxiv.org/abs/1011.3027.Google Scholar
- [63] (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- [64] (2010) Uniform exponential convergence of sample average random functions under general sampling with applications in stochastic programming. J. Math. Anal. Appl. 368(2):692–710.Google Scholar
- [65] (2009) Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Programming 119(2):371–401.Google Scholar
- [66] (2010) Smoothness, low noise and fast rates. Lafferty JD,Williams CKI, Shawe-Taylor J, eds. vol. 23, Advances in Neural Information Processing Systems (Curran Associates, Inc., Red Hook, NY), 2199–2207.Google Scholar
- [67] (2018) On the convergence rate of stochastic mirror descent for nonsmooth nonconvex optimization. Preprint, submitted June 12, https://arxiv.org/abs/1806.04781.Google Scholar
- [68] (2003) Online convex programming and generalized infinitesimal gradient ascent. Proc. 20th Internat. Conf. Mach. Learn. (AAAI Press), 928–935.Google Scholar

