Asymptotic Properties of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization

Published Online:https://doi.org/10.1287/moor.2021.1198

References

  • [1] Attouch H (1984) Variational Convergence for Functions and Operators (Pitman Press, Boston).Google Scholar
  • [2] Bryc W (2012) The Normal Distribution: Characterizations with Applications (Springer Science & Business Media, New York).Google Scholar
  • [3] Candes EJ, Li X, Soltanolkotabi M (2015) Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inform. Theory 61(4):1985–2007.CrossrefGoogle Scholar
  • [4] Chen Y, Chi Y, Fan J, Ma C (2019) Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Math. Programming 176:5–37.CrossrefGoogle Scholar
  • [5] Chernoff H (1954) On the distribution of the likelihood ratio. Ann. Math. Statist. 25(3):573–578.CrossrefGoogle Scholar
  • [6] Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.CrossrefGoogle Scholar
  • [7] Clarke FH (1990) Optimization and Nonsmooth Analysis. Classics in Applied Mathematics. SIAM, vol. 5, (Reprint from John Wiley, New York).CrossrefGoogle Scholar
  • [8] Cui Y, Pang JS, Sen B (2018) Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4):3344–3374.CrossrefGoogle Scholar
  • [9] Cui Y, Chang TH, Hong M, Pang JS (2020) A study of piecewise linear-quadratic programs. J. Optim. Theory Appl. 186:523–553.CrossrefGoogle Scholar
  • [10] Davis D, Drusvyatskiy D (2018) Graphical convergence of subgradients in nonconvex optimization and learning. Preprint, submitted October 17, https://arxiv.org/abs/1810.07590.Google Scholar
  • [11] Davis D, Drusvyatskiy D, Paquette C (2018) The nonsmooth landscape of phase retrieval. IMA J. Numerical Anal. 40(4):2652–2695.CrossrefGoogle Scholar
  • [12] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119–154.CrossrefGoogle Scholar
  • [13] Dentcheva D, Penev S, Ruszczynski A (2017) Statistical estimation of composite risk functionals and risk optimization problems. Ann. Inst. Statist. Math. 69:737–760.CrossrefGoogle Scholar
  • [14] Duchi J, Ruan F (2018) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.CrossrefGoogle Scholar
  • [15] Dunson D, Hannah LA (2013) Multivariate convex regression with adaptive partitioning. J. Machine Learn. Res. 14:3261–3294.Google Scholar
  • [16] Dupačová J, Wets RJ-B (1988) Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Statist. 16(4):1517–1549.CrossrefGoogle Scholar
  • [17] Facchinei F, Pang JS (2003) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer, New York).Google Scholar
  • [18] Ferguson TS (1996) A Course in Large Sample Theory (Routledge).CrossrefGoogle Scholar
  • [19] Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. A 222:594–604.Google Scholar
  • [20] Fisher RA (1925) Theory of statistical estimation. Math. Proc. Cambridge Philos. Soc. 22:700–725.CrossrefGoogle Scholar
  • [21] Geyer CJ (1994) On the asymptotics of constrained M-estimation. Ann. Statist. 22(4):1993–2010.CrossrefGoogle Scholar
  • [22] Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. Proc. 14th Internat. Conf. Artificial Intelligence Statist. 315–323.Google Scholar
  • [23] Gürkan G, Yonca Özge A, Robinson S (1999) Sample-path solution of stochastic variational inequalities. Math. Programming 84:313–333.CrossrefGoogle Scholar
  • [24] Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. Le Cam LM, Neyman J, eds. Proc. Fifth Berkeley Sympos. Math. Statist. Probab., vol. 1 (University of California Press), 221–233.Google Scholar
  • [25] Jin C, Liu L, Ge R, Jordan M (2018) On the local minima of the empirical risk. Proc. Neural Inform. Processing Systems (NIPS).Google Scholar
  • [26] King AJ (1993) Asymptotic behaviour of solutions in stochastic optimization: nonsmooth analysis and the derivation of non-normal limit distributions. Unpublished PhD dissertation, University of Washington, Seattle.Google Scholar
  • [27] King A, Rockafellar RT (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148–162.LinkGoogle Scholar
  • [28] LeCam L (1970) On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Statist. 41(3):802–828.CrossrefGoogle Scholar
  • [29] Le Thi HA, Pham DT (2005) The DC programming and DCA revised with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133:25–46.Google Scholar
  • [30] Loh PL (2017) Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Ann. Statist. 45(2):866–896.CrossrefGoogle Scholar
  • [31] Loh PL, Wainwright M (2012) High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40(3):1637–1664.CrossrefGoogle Scholar
  • [32] Loh PL, Wainwright M (2015) Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Machine Learn. Res. 16(1):559–616.Google Scholar
  • [33] Lu Z, Zhou Z, Sun Z (2019) Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Math. Programming 176:369–401.CrossrefGoogle Scholar
  • [34] Luedtke AR, Van Der Laan MJ (2016) Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Statist. 44(2):713–742.CrossrefGoogle Scholar
  • [35] Ma J, Xu J, Maleki A (2019) Optimization-based AMP for phase retrieval: The impact of initialization and ℓ2 regularization. IEEE Trans. Inform. Theory 65(6):3600–3629.CrossrefGoogle Scholar
  • [36] Mei S, Bai Y, Montanari A (2018) The landscape of empirical risk for non-convex losses. Ann. Statist. 46:2747–2774.CrossrefGoogle Scholar
  • [37] Molchanov I (2005) Theory of Random Sets, vol. 19, no. 2 (Springer, London).Google Scholar
  • [38] Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. Proc. 27th Internat. Conf. Machine Learn., 807–814.Google Scholar
  • [39] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.CrossrefGoogle Scholar
  • [40] Pang JS, Razaviyayn M, Alvarado A (2016) Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42(1):95–118.LinkGoogle Scholar
  • [41] Polyak B (1990) New stochastic approximation type procedures. Avtomatica i Telemekhanika 51(7):98–107.Google Scholar
  • [42] Polyak B, Juditsky A (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.CrossrefGoogle Scholar
  • [43] Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.CrossrefGoogle Scholar
  • [44] Rockafellar RT (1970) Convex Analysis (Princeton University Press, Princeton).CrossrefGoogle Scholar
  • [45] Rockafellar RT, Wets RJ-B (1998) Variational Analysis (Springer, New York).CrossrefGoogle Scholar
  • [46] Royset J (2020) Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation. Math. Programming Ser. A 184:289–318.CrossrefGoogle Scholar
  • [47] Royset J, Wets RJ-B (2020) Variational analysis of constrained M-estimators. Ann. Statist. 48:2759–2790.CrossrefGoogle Scholar
  • [48] Scholtes S (2002) Introduction to Piecewise Differentiable Equations (Springer).Google Scholar
  • [49] Shapiro A (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.CrossrefGoogle Scholar
  • [50] Shapiro A (2003) Monte Carlo sampling methods. Rusczyński A, Shapiro A, eds. Stochastic Programming, Handbooks in OR & MS, vol. 10 (NorthHolland Publishing Company, Amsterdam), 353–425.CrossrefGoogle Scholar
  • [51] Shapiro A, Xu H (2007) Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions. J. Math. Anal. Appl. 325(2):1390–1399.CrossrefGoogle Scholar
  • [52] Shapiro A, Dentcheva D, Ruszczyński A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM Publications, Philadelphia).CrossrefGoogle Scholar
  • [53] Shechtman Y, Eldar Y, Cohen O, Chapman H, Miao J, Segev M (2015) Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Processing Magazine 32(3):87–109.CrossrefGoogle Scholar
  • [54] Sun J, Qu Q, Wright J (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18:1131–1198.CrossrefGoogle Scholar
  • [55] van de Geer S (2000) Empirical Processes in M-estimation, vol. 6 (Cambridge University Press).Google Scholar
  • [56] van der Vaart AW (1998) Asymptotic Statistics, vol. 3 (Cambridge University Press).CrossrefGoogle Scholar
  • [57] van der Vaart AW, Wellner J (1996) Weak Convergence and Empirical Processes (Springer).CrossrefGoogle Scholar
  • [58] Xu H (2010) Sample average approximation methods for a class of stochastic variational inequality problems. Asia-Pacific J. Oper. Res. 27(1):103–119.CrossrefGoogle Scholar
  • [59] Xu H, Zhang D (2009) Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Programming 119:371–401.CrossrefGoogle Scholar
  • [60] Wald A (1949) Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20(4):595–601.CrossrefGoogle Scholar
  • [61] Wang G, Giannakis GB, Eldar YC (2018) Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inform. Theory 64(2):773–794.CrossrefGoogle Scholar
  • [62] Wets RJB (1979) A statistical approach to the solution of stochastic programs with (convex) simple recourse. Working paper, University of Kentucky, Lexington.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.