On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning

Feng Ruan
Feng Ruan
[email protected]
https://orcid.org/0000-0003-1336-8229
Department of Statistics and Data Science, Northwestern University, Evanston, Illinois 60208
Search for more papers by this author

Department of Statistics and Data Science, Northwestern University, Evanston, Illinois 60208

Search for more papers by this author

Published Online:26 Aug 2025https://doi.org/10.1287/moor.2024.0533

References

[1] Armstrong MA (2013) Basic Topology (Springer, New York).Google Scholar
[2] Artstein Z, Vitale RA (1975) A strong law of large numbers for random compact sets. Ann. Probab. 3(5):879–882.Crossref, Google Scholar
[3] Attouch H (1977) Convergence de fonctions convexes, des sous-différentiels et semi-groupes associés. CR Acad. Sci. Paris 284:539–542.Google Scholar
[4] Attouch H, Beer G (1993) On the convergence of subdifferentials of convex functions. Arch. Math. 60(4):389–400.Crossref, Google Scholar
[5] Aubin J-P, Frankowska H (2009) Set-Valued Analysis (Springer, New York).Crossref, Google Scholar
[6] Aumann RJ (1965) Integrals of set-valued functions. J. Math. Anal. Appl. 12(1):1–12.Crossref, Google Scholar
[7] Bai Y, Jiang Q, Sun J (2019) Subgradient descent learns orthogonal dictionaries. Seventh Internat. Conf. Learn. Representation (ICLR, Appleton, WI).Google Scholar
[8] Billingsley P (1986) Probability and Measure, 2nd ed. (Wiley, New York).Google Scholar
[9] Bolte J, Daniilidis A, Ley O, Mazet L (2010) Characterizations of Lojasiewicz inequalities: Subgradient flows, talweg, convexity. Trans. Amer. Math. Soc. 362(6): 3319–3363.Google Scholar
[10] Borwein JM, Vanderwerff JD (2010) Convex functions: Constructions. Characterizations and Counterexamples, vol. 109 (Cambridge University Press, Cambridge, UK).Google Scholar
[11] Burke JV (1985) Descent methods for composite nondifferentiable optimization problems. Math. Programming 33:260–279.Crossref, Google Scholar
[12] Candes EJ, Li X, Soltanolkotabi M (2015) Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inform. Theory 61(4):1985–2007.Crossref, Google Scholar
[13] Charisopoulos V, Davis D, Díaz M, Drusvyatskiy D (2021) Composite optimization for robust rank one bilinear sensing. Inform. Inference 10(2):333–396.Crossref, Google Scholar
[14] Charisopoulos V, Chen Y, Davis D, Díaz M, Ding L, Drusvyatskiy D (2021) Low-rank matrix recovery with composite optimization: Good conditioning and rapid convergence. Foundations Comput. Math. 21(6):1505–1593.Crossref, Google Scholar
[15] Davis D, Drusvyatskiy D (2019) Stochastic model-based minimization of weakly convex functions. SIAM J. Optim. 29(1):207–239.Crossref, Google Scholar
[16] Davis D, Drusvyatskiy D (2020) Subgradient methods under weak convexity and tame geometry. SIAG/OPT Views News 28:1–10.Google Scholar
[17] Davis D, Drusvyatskiy D (2022) Graphical convergence of subgradients in nonconvex optimization and learning. Math. Oper. Res. 47(1):209–231.Link, Google Scholar
[18] Davis D, Drusvyatskiy D, Paquette C (2020) The nonsmooth landscape of phase retrieval. IMA J. Numerical Anal. 40(4):2652–2695.Crossref, Google Scholar
[19] Díaz M (2019) The nonsmooth landscape of blind deconvolution. Workshop Optim. Machine Learn.Google Scholar
[20] Ding L, Jiang L, Chen Y, Qu Q, Zhu Z (2021) Rank overspecified robust matrix recovery: Subgradient method and exact recovery. Preprint, submitted September 23, https://arxiv.org/abs/2109.11154.Google Scholar
[21] Duchi JC, Ruan F (2018) Stochastic methods for composite and weakly convex optimization problems. SIAM J. Optim. 28(4):3229–3259.Crossref, Google Scholar
[22] Duchi JC, Ruan F (2019) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.Crossref, Google Scholar
[23] Dudley RM (1967) The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Functional Anal. 1(3):290–330.Crossref, Google Scholar
[24] Eldar YC, Mendelson S (2014) Phase retrieval: Stability and recovery guarantees. Appl. Comput. Harmonic Anal. 36(3):473–494.Crossref, Google Scholar
[25] Han X, Lewis AS (2023) Survey descent: A multipoint generalization of gradient descent for nonsmooth optimization. SIAM J. Optim. 33(1):36–62.Crossref, Google Scholar
[26] Horowitz JL (1998) Bootstrap methods for median regression models. Econometrica 66(6):1327–1351.Crossref, Google Scholar
[27] Koenker R (2005) Quantile Regression, vol. 38 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[28] Koenker R, Park BJ (1996) An interior point algorithm for nonlinear quantile regression. J. Econometrics 71(1–2):265–283.Crossref, Google Scholar
[29] Lam H, Wang K, Wu Y, Zhang Y (2022) Adaptive data fusion for multi-task non-smooth optimization. Preprint, submitted October 22, https://arxiv.org/abs/2210.12334.Google Scholar
[30] Li X, Zhu Z, Man-Cho So A, Vidal R (2020) Nonconvex robust low-rank matrix recovery. SIAM J. Optim. 30(1):660–686.Crossref, Google Scholar
[31] Liu J, Cui Y, Pang J-S (2022) Solving nonsmooth and nonconvex compound stochastic programs with applications to risk measure minimization. Math. Oper. Res. 47(4):3051–3083.Link, Google Scholar
[32] Ma J, Fattahi S (2021) Sign-rip: A robust restricted isometry property for low-rank matrix recovery. Adv. Neural Inform. Processing Systems Workshop Optim. Machine Learn.Google Scholar
[33] Matoušek J (2013) Lectures on Discrete Geometry, vol. 212 (Springer, New York).Google Scholar
[34] Milnor J (1964) On the Betti numbers of real varieties. Proc. Amer. Math. Soc. 15(2):275–280.Crossref, Google Scholar
[35] Molchanov I (2005) Theory of Random Sets, vol. 19 (Springer, New York).Google Scholar
[36] Moreau J-J (1965) Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. France 93:273–299.Crossref, Google Scholar
[37] Poliquin R, Rockafellar RT (1992) Amenable functions in optimization. Nonsmooth Optimization: Methods and Applications (Erice), 338–353.Google Scholar
[38] Poliquin R, Rockafellar RT (1996) Prox-regular functions in variational analysis. Trans. Amer. Math. Soc. 348(5):1805–1838.Crossref, Google Scholar
[39] Ralph D, Xu H (2011) Convergence of stationary points of sample average two-stage stochastic programs: A generalized equation approach. Math. Oper. Res. 36(3):568–592.Link, Google Scholar
[40] Rockafellar RT (1970) Convex Analysis (Princeton University Press, Princeton, NJ).Crossref, Google Scholar
[41] Rockafellar RT (1981) Favorable classes of Lipschitz continuous functions in subgradient optimization.Google Scholar
[42] Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J Risk 2(3):21–42.Crossref, Google Scholar
[43] Rockafellar RT, Wets RJ (1998) Variational Analysis (Springer, New York).Crossref, Google Scholar
[44] Sauer N (1972) On the density of families of sets. J. Combin. Theory Ser. A 13(1):145–147.Crossref, Google Scholar
[45] Shapiro A, Xu H (2007) Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions. J. Math. Anal. Appl. 325(2):1390–1399.Crossref, Google Scholar
[46] Shapiro A, Dentcheva D, Ruszczynski A (2021) Lectures on Stochastic Programming: Modeling and Theory (SIAM, Philadelphia).Crossref, Google Scholar
[47] Shelah S (1972) A combinatorial problem; stability and order for models and theories in infinitary languages. Pacific J. Math. 41(1):247–261.Crossref, Google Scholar
[48] Sun J, Qu Q, Wright J (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18:1131–1198.Crossref, Google Scholar
[49] Thom R (1965) On the homology of real algebraic varieties. Differential and Combinatorial Topology.Crossref, Google Scholar
[50] van der Vaart AW (2000) Asymptotic Statistics, vol. 3 (Cambridge University Press, Cambridge, UK).Google Scholar
[51] van der Vaart AW, Wellner JA (1996) Weak Convergence and Empirical Processes: With Applications to Statistics (Springer, New York).Crossref, Google Scholar
[52] Vapnik VN (2013) The Nature of Statistical Learning Theory (Springer, New York).Google Scholar
[53] Vapnik VN, Chervonenkis AY (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16(2):264–280.Crossref, Google Scholar
[54] Vershynin R (2018) High-Dimensional Probability: An Introduction with Applications in Data Science, vol. 47 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[55] Wainwright MJ (2019) High-Dimensional Statistics: A Non-Asymptotic Viewpoint, vol. 48 (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
[56] Wang G, Giannakis GB, Eldar YC (2017) Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inform. Theory 64(2):773–794.Crossref, Google Scholar
[57] Warren HE (1968) Lower bounds for approximation by nonlinear manifolds. Trans. Amer. Math. Soc. 133(1):167–178.Crossref, Google Scholar
[58] Xu H (2010) Uniform exponential convergence of sample average random functions under general sampling with applications in stochastic programming. J. Math. Anal. Appl. 368(2):692–710.Crossref, Google Scholar
[59] Xu H, Zhang D (2009) Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Programming 119:371–401.Crossref, Google Scholar

cover image Mathematics of Operations Research

Articles In Advance

Article Information

Metrics

Information

Received:May 22, 2024
Accepted:May 06, 2025
Published Online:August 26, 2025

Cite as

Feng Ruan (2025) On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning. Mathematics of Operations Research 0(0).

https://doi.org/10.1287/moor.2024.0533

Keywords

Acknowledgments

The author extends his gratitude to his friends and advisor—John Duchi, Qiyang Han, X. Y. Han, Kate Yao, and Yichen Zhang—for their support and patience as he tackled this once seemingly challenging problem. A shift in perspective during his time at Northwestern led him to this solution. The author is also grateful to Damek Davis and Dmitriy Drusvyatskiy for their valuable feedback on the initial manuscript draft posted on arXiv. The author also expresses gratitude to Alexander Shapiro and the anonymous reviewers for their valuable discussions on literature and insights regarding the proof of Theorem 1 using variational analytic tools.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

On the Uniform Convergence of Subdifferentials in Stochastic Optimization and Learning

References

Articles In Advance

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News