Asymptotic Properties of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization
Published Online:10 Nov 2021https://doi.org/10.1287/moor.2021.1198
References
- [1] (1984) Variational Convergence for Functions and Operators (Pitman Press, Boston).Google Scholar
- [2] (2012) The Normal Distribution: Characterizations with Applications (Springer Science & Business Media, New York).Google Scholar
- [3] (2015) Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inform. Theory 61(4):1985–2007.Crossref, Google Scholar
- [4] (2019) Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Math. Programming 176:5–37.Crossref, Google Scholar
- [5] (1954) On the distribution of the likelihood ratio. Ann. Math. Statist. 25(3):573–578.Crossref, Google Scholar
- [6] (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.Crossref, Google Scholar
- [7] (1990) Optimization and Nonsmooth Analysis. Classics in Applied Mathematics. SIAM, vol. 5, (Reprint from John Wiley, New York).Crossref, Google Scholar
- [8] (2018) Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4):3344–3374.Crossref, Google Scholar
- [9] (2020) A study of piecewise linear-quadratic programs. J. Optim. Theory Appl. 186:523–553.Crossref, Google Scholar
- [10] (2018) Graphical convergence of subgradients in nonconvex optimization and learning. Preprint, submitted October 17, https://arxiv.org/abs/1810.07590.Google Scholar
- [11] (2018) The nonsmooth landscape of phase retrieval. IMA J. Numerical Anal. 40(4):2652–2695.Crossref, Google Scholar
- [12] (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119–154.Crossref, Google Scholar
- [13] (2017) Statistical estimation of composite risk functionals and risk optimization problems. Ann. Inst. Statist. Math. 69:737–760.Crossref, Google Scholar
- [14] (2018) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.Crossref, Google Scholar
- [15] (2013) Multivariate convex regression with adaptive partitioning. J. Machine Learn. Res. 14:3261–3294.Google Scholar
- [16] (1988) Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Statist. 16(4):1517–1549.Crossref, Google Scholar
- [17] (2003) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer, New York).Google Scholar
- [18] (1996) A Course in Large Sample Theory (Routledge).Crossref, Google Scholar
- [19] (1922) On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. A 222:594–604.Google Scholar
- [20] (1925) Theory of statistical estimation. Math. Proc. Cambridge Philos. Soc. 22:700–725.Crossref, Google Scholar
- [21] (1994) On the asymptotics of constrained M-estimation. Ann. Statist. 22(4):1993–2010.Crossref, Google Scholar
- [22] (2011) Deep sparse rectifier neural networks. Proc. 14th Internat. Conf. Artificial Intelligence Statist. 315–323.Google Scholar
- [23] (1999) Sample-path solution of stochastic variational inequalities. Math. Programming 84:313–333.Crossref, Google Scholar
- [24] (1967) The behavior of maximum likelihood estimates under nonstandard conditions. Le Cam LM, Neyman J, eds. Proc. Fifth Berkeley Sympos. Math. Statist. Probab., vol. 1 (University of California Press), 221–233.Google Scholar
- [25] (2018) On the local minima of the empirical risk. Proc. Neural Inform. Processing Systems (NIPS).Google Scholar
- [26] (1993) Asymptotic behaviour of solutions in stochastic optimization: nonsmooth analysis and the derivation of non-normal limit distributions. Unpublished PhD dissertation, University of Washington, Seattle.Google Scholar
- [27] (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148–162.Link, Google Scholar
- [28] (1970) On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Statist. 41(3):802–828.Crossref, Google Scholar
- [29] (2005) The DC programming and DCA revised with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133:25–46.Google Scholar
- [30] (2017) Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Ann. Statist. 45(2):866–896.Crossref, Google Scholar
- [31] (2012) High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40(3):1637–1664.Crossref, Google Scholar
- [32] (2015) Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Machine Learn. Res. 16(1):559–616.Google Scholar
- [33] (2019) Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Math. Programming 176:369–401.Crossref, Google Scholar
- [34] (2016) Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Statist. 44(2):713–742.Crossref, Google Scholar
- [35] (2019) Optimization-based AMP for phase retrieval: The impact of initialization and ℓ2 regularization. IEEE Trans. Inform. Theory 65(6):3600–3629.Crossref, Google Scholar
- [36] (2018) The landscape of empirical risk for non-convex losses. Ann. Statist. 46:2747–2774.Crossref, Google Scholar
- [37] (2005) Theory of Random Sets, vol. 19, no. 2 (Springer, London).Google Scholar
- [38] (2010) Rectified linear units improve restricted Boltzmann machines. Proc. 27th Internat. Conf. Machine Learn., 807–814.Google Scholar
- [39] (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
- [40] (2016) Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42(1):95–118.Link, Google Scholar
- [41] (1990) New stochastic approximation type procedures. Avtomatica i Telemekhanika 51(7):98–107.Google Scholar
- [42] (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
- [43] (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
- [44] (1970) Convex Analysis (Princeton University Press, Princeton).Crossref, Google Scholar
- [45] (1998) Variational Analysis (Springer, New York).Crossref, Google Scholar
- [46] (2020) Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation. Math. Programming Ser. A 184:289–318.Crossref, Google Scholar
- [47] (2020) Variational analysis of constrained M-estimators. Ann. Statist. 48:2759–2790.Crossref, Google Scholar
- [48] (2002) Introduction to Piecewise Differentiable Equations (Springer).Google Scholar
- [49] (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.Crossref, Google Scholar
- [50] (2003) Monte Carlo sampling methods. Rusczyński A, Shapiro A, eds. Stochastic Programming, Handbooks in OR & MS, vol. 10 (NorthHolland Publishing Company, Amsterdam), 353–425.Crossref, Google Scholar
- [51] (2007) Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions. J. Math. Anal. Appl. 325(2):1390–1399.Crossref, Google Scholar
- [52] (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM Publications, Philadelphia).Crossref, Google Scholar
- [53] (2015) Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Processing Magazine 32(3):87–109.Crossref, Google Scholar
- [54] (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18:1131–1198.Crossref, Google Scholar
- [55] (2000) Empirical Processes in M-estimation, vol. 6 (Cambridge University Press).Google Scholar
- [56] (1998) Asymptotic Statistics, vol. 3 (Cambridge University Press).Crossref, Google Scholar
- [57] (1996) Weak Convergence and Empirical Processes (Springer).Crossref, Google Scholar
- [58] (2010) Sample average approximation methods for a class of stochastic variational inequality problems. Asia-Pacific J. Oper. Res. 27(1):103–119.Crossref, Google Scholar
- [59] (2009) Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Programming 119:371–401.Crossref, Google Scholar
- [60] (1949) Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20(4):595–601.Crossref, Google Scholar
- [61] (2018) Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inform. Theory 64(2):773–794.Crossref, Google Scholar
- [62] (1979) A statistical approach to the solution of stochastic programs with (convex) simple recourse. Working paper, University of Kentucky, Lexington.Google Scholar

