On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning

Published Online:https://doi.org/10.1287/moor.2024.0533

References

  • [1] Armstrong MA (2013) Basic Topology (Springer, New York).Google Scholar
  • [2] Artstein Z, Vitale RA (1975) A strong law of large numbers for random compact sets. Ann. Probab. 3(5):879–882.CrossrefGoogle Scholar
  • [3] Attouch H (1977) Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés. CR Acad. Sci. Paris 284:539–542.Google Scholar
  • [4] Attouch H, Beer G (1993) On the convergence of subdifferentials of convex functions. Arch. Math. 60(4):389–400.CrossrefGoogle Scholar
  • [5] Aubin J-P, Frankowska H (2009) Set-Valued Analysis (Springer, New York).CrossrefGoogle Scholar
  • [6] Aumann RJ (1965) Integrals of set-valued functions. J. Math. Anal. Appl. 12(1):1–12.CrossrefGoogle Scholar
  • [7] Bai Y, Jiang Q, Sun J (2019) Subgradient descent learns orthogonal dictionaries. Seventh Internat. Conf. Learn. Representation (ICLR, Appleton, WI).Google Scholar
  • [8] Billingsley P (1986) Probability and Measure, 2nd ed. (Wiley, New York).Google Scholar
  • [9] Bolte J, Daniilidis A, Ley O, Mazet L (2010) Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity. Trans. Amer. Math. Soc. 362(6): 3319–3363.Google Scholar
  • [10] Borwein JM, Vanderwerff JD (2010) Convex functions: Constructions. Characterizations and Counterexamples, vol. 109 (Cambridge University Press, Cambridge, UK).Google Scholar
  • [11] Burke JV (1985) Descent methods for composite nondifferentiable optimization problems. Math. Programming 33:260–279.CrossrefGoogle Scholar
  • [12] Candes EJ, Li X, Soltanolkotabi M (2015) Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inform. Theory 61(4):1985–2007.CrossrefGoogle Scholar
  • [13] Charisopoulos V, Davis D, Díaz M, Drusvyatskiy D (2021) Composite optimization for robust rank one bilinear sensing. Inform. Inference 10(2):333–396.CrossrefGoogle Scholar
  • [14] Charisopoulos V, Chen Y, Davis D, Díaz M, Ding L, Drusvyatskiy D (2021) Low-rank matrix recovery with composite optimization: Good conditioning and rapid convergence. Foundations Comput. Math. 21(6):1505–1593.CrossrefGoogle Scholar
  • [15] Davis D, Drusvyatskiy D (2019) Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1):207–239.CrossrefGoogle Scholar
  • [16] Davis D, Drusvyatskiy D (2020) Subgradient methods under weak convexity and tame geometry. SIAG/OPT Views News 28:1–10.Google Scholar
  • [17] Davis D, Drusvyatskiy D (2022) Graphical convergence of subgradients in nonconvex optimization and learning. Math. Oper. Res. 47(1):209–231.LinkGoogle Scholar
  • [18] Davis D, Drusvyatskiy D, Paquette C (2020) The nonsmooth landscape of phase retrieval. IMA J. Numerical Anal. 40(4):2652–2695.CrossrefGoogle Scholar
  • [19] Díaz M (2019) The nonsmooth landscape of blind deconvolution. Workshop Optim. Machine Learn.Google Scholar
  • [20] Ding L, Jiang L, Chen Y, Qu Q, Zhu Z (2021) Rank overspecified robust matrix recovery: Subgradient method and exact recovery. Preprint, submitted September 23, https://arxiv.org/abs/2109.11154.Google Scholar
  • [21] Duchi JC, Ruan F (2018) Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4):3229–3259.CrossrefGoogle Scholar
  • [22] Duchi JC, Ruan F (2019) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.CrossrefGoogle Scholar
  • [23] Dudley RM (1967) The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Functional Anal. 1(3):290–330.CrossrefGoogle Scholar
  • [24] Eldar YC, Mendelson S (2014) Phase retrieval: Stability and recovery guarantees. Appl. Comput. Harmonic Anal. 36(3):473–494.CrossrefGoogle Scholar
  • [25] Han X, Lewis AS (2023) Survey descent: A multipoint generalization of gradient descent for nonsmooth optimization. SIAM J. Optim. 33(1):36–62.CrossrefGoogle Scholar
  • [26] Horowitz JL (1998) Bootstrap methods for median regression models. Econometrica 66(6):1327–1351.CrossrefGoogle Scholar
  • [27] Koenker R (2005) Quantile Regression, vol. 38 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [28] Koenker R, Park BJ (1996) An interior point algorithm for nonlinear quantile regression. J. Econometrics 71(1–2):265–283.CrossrefGoogle Scholar
  • [29] Lam H, Wang K, Wu Y, Zhang Y (2022) Adaptive data fusion for multi-task non-smooth optimization. Preprint, submitted October 22, https://arxiv.org/abs/2210.12334.Google Scholar
  • [30] Li X, Zhu Z, Man-Cho So A, Vidal R (2020) Nonconvex robust low-rank matrix recovery. SIAM J. Optim. 30(1):660–686.CrossrefGoogle Scholar
  • [31] Liu J, Cui Y, Pang J-S (2022) Solving nonsmooth and nonconvex compound stochastic programs with applications to risk measure minimization. Math. Oper. Res. 47(4):3051–3083.LinkGoogle Scholar
  • [32] Ma J, Fattahi S (2021) Sign-rip: A robust restricted isometry property for low-rank matrix recovery. Adv. Neural Inform. Processing Systems Workshop Optim. Machine Learn.Google Scholar
  • [33] Matoušek J (2013) Lectures on Discrete Geometry, vol. 212 (Springer, New York).Google Scholar
  • [34] Milnor J (1964) On the Betti numbers of real varieties. Proc. Amer. Math. Soc. 15(2):275–280.CrossrefGoogle Scholar
  • [35] Molchanov I (2005) Theory of Random Sets, vol. 19 (Springer, New York).Google Scholar
  • [36] Moreau J-J (1965) Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93:273–299.CrossrefGoogle Scholar
  • [37] Poliquin R, Rockafellar RT (1992) Amenable functions in optimization. Nonsmooth Optimization: Methods and Applications (Erice), 338–353.Google Scholar
  • [38] Poliquin R, Rockafellar RT (1996) Prox-regular functions in variational analysis. Trans. Amer. Math. Soc. 348(5):1805–1838.CrossrefGoogle Scholar
  • [39] Ralph D, Xu H (2011) Convergence of stationary points of sample average two-stage stochastic programs: A generalized equation approach. Math. Oper. Res. 36(3):568–592.LinkGoogle Scholar
  • [40] Rockafellar RT (1970) Convex Analysis (Princeton University Press, Princeton, NJ).CrossrefGoogle Scholar
  • [41] Rockafellar RT (1981) Favorable classes of Lipschitz continuous functions in subgradient optimization.Google Scholar
  • [42] Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J Risk 2(3):21–42.CrossrefGoogle Scholar
  • [43] Rockafellar RT, Wets RJ (1998) Variational Analysis (Springer, New York).CrossrefGoogle Scholar
  • [44] Sauer N (1972) On the density of families of sets. J. Combin. Theory Ser. A 13(1):145–147.CrossrefGoogle Scholar
  • [45] Shapiro A, Xu H (2007) Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions. J. Math. Anal. Appl. 325(2):1390–1399.CrossrefGoogle Scholar
  • [46] Shapiro A, Dentcheva D, Ruszczynski A (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).CrossrefGoogle Scholar
  • [47] Shelah S (1972) A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific J. Math. 41(1):247–261.CrossrefGoogle Scholar
  • [48] Sun J, Qu Q, Wright J (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18:1131–1198.CrossrefGoogle Scholar
  • [49] Thom R (1965) On the homology of real algebraic varieties. Differential and Combinatorial Topology.CrossrefGoogle Scholar
  • [50] van der Vaart AW (2000) Asymptotic Statistics, vol. 3 (Cambridge University Press, Cambridge, UK).Google Scholar
  • [51] van der Vaart AW, Wellner JA (1996) Weak Convergence and Empirical Processes: With Applications to Statistics (Springer, New York).CrossrefGoogle Scholar
  • [52] Vapnik VN (2013) The Nature of Statistical Learning Theory (Springer, New York).Google Scholar
  • [53] Vapnik VN, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2):264–280.CrossrefGoogle Scholar
  • [54] Vershynin R (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [55] Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • [56] Wang G, Giannakis GB, Eldar YC (2017) Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inform. Theory 64(2):773–794.CrossrefGoogle Scholar
  • [57] Warren HE (1968) Lower bounds for approximation by nonlinear manifolds. Trans. Amer. Math. Soc. 133(1):167–178.CrossrefGoogle Scholar
  • [58] Xu H (2010) Uniform exponential convergence of sample average random functions under general sampling with applications in stochastic programming. J. Math. Anal. Appl. 368(2):692–710.CrossrefGoogle Scholar
  • [59] Xu H, Zhang D (2009) Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Programming 119:371–401.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.