Asymptotic Properties of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization

Zhengling Qi
Zhengling Qi
[email protected]
https://orcid.org/0000-0003-0270-7969
Department of Decision Sciences, George Washington University, Washington, District of Columbia, 20052;
Search for more papers by this author
,
Ying Cui
Ying Cui
[email protected]
https://orcid.org/0000-0003-4173-5647
Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455;
Search for more papers by this author
,
Yufeng Liu
Yufeng Liu
[email protected]
https://orcid.org/0000-0002-1686-0545
Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, Carolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina 27599;
Search for more papers by this author
,
Jong-Shi Pang
Jong-Shi Pang
[email protected]
https://orcid.org/0000-0002-2224-6787
The Daniel J. Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90007
Search for more papers by this author

Department of Decision Sciences, George Washington University, Washington, District of Columbia, 20052;

Search for more papers by this author

Ying Cui

[email protected]

https://orcid.org/0000-0003-4173-5647

Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455;

Search for more papers by this author

Yufeng Liu

[email protected]

https://orcid.org/0000-0002-1686-0545

Department of Statistics and Operations Research, Department of Genetics, Department of Biostatistics, Carolina Center for Genome Sciences, Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, Chapel Hill, North Carolina 27599;

Search for more papers by this author

Jong-Shi Pang

[email protected]

https://orcid.org/0000-0002-2224-6787

The Daniel J. Epstein Department of Industrial and Systems Engineering, University of Southern California, Los Angeles, California 90007

Search for more papers by this author

Published Online:10 Nov 2021https://doi.org/10.1287/moor.2021.1198

References

[1] Attouch H (1984) Variational Convergence for Functions and Operators (Pitman Press, Boston).Google Scholar
[2] Bryc W (2012) The Normal Distribution: Characterizations with Applications (Springer Science & Business Media, New York).Google Scholar
[3] Candes EJ, Li X, Soltanolkotabi M (2015) Phase retrieval via Wirtinger flow: Theory and algorithms. IEEE Trans. Inform. Theory 61(4):1985–2007.Crossref, Google Scholar
[4] Chen Y, Chi Y, Fan J, Ma C (2019) Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Math. Programming 176:5–37.Crossref, Google Scholar
[5] Chernoff H (1954) On the distribution of the likelihood ratio. Ann. Math. Statist. 25(3):573–578.Crossref, Google Scholar
[6] Chung KL (1954) On a stochastic approximation method. Ann. Math. Statist. 25(3):463–483.Crossref, Google Scholar
[7] Clarke FH (1990) Optimization and Nonsmooth Analysis. Classics in Applied Mathematics. SIAM, vol. 5, (Reprint from John Wiley, New York).Crossref, Google Scholar
[8] Cui Y, Pang JS, Sen B (2018) Composite difference-max programs for modern statistical estimation problems. SIAM J. Optim. 28(4):3344–3374.Crossref, Google Scholar
[9] Cui Y, Chang TH, Hong M, Pang JS (2020) A study of piecewise linear-quadratic programs. J. Optim. Theory Appl. 186:523–553.Crossref, Google Scholar
[10] Davis D, Drusvyatskiy D (2018) Graphical convergence of subgradients in nonconvex optimization and learning. Preprint, submitted October 17, https://arxiv.org/abs/1810.07590.Google Scholar
[11] Davis D, Drusvyatskiy D, Paquette C (2018) The nonsmooth landscape of phase retrieval. IMA J. Numerical Anal. 40(4):2652–2695.Crossref, Google Scholar
[12] Davis D, Drusvyatskiy D, Kakade S, Lee JD (2020) Stochastic subgradient method converges on tame functions. Foundations Comput. Math. 20(1):119–154.Crossref, Google Scholar
[13] Dentcheva D, Penev S, Ruszczynski A (2017) Statistical estimation of composite risk functionals and risk optimization problems. Ann. Inst. Statist. Math. 69:737–760.Crossref, Google Scholar
[14] Duchi J, Ruan F (2018) Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inform. Inference 8(3):471–529.Crossref, Google Scholar
[15] Dunson D, Hannah LA (2013) Multivariate convex regression with adaptive partitioning. J. Machine Learn. Res. 14:3261–3294.Google Scholar
[16] Dupačová J, Wets RJ-B (1988) Asymptotic behavior of statistical estimators and of optimal solutions of stochastic optimization problems. Ann. Statist. 16(4):1517–1549.Crossref, Google Scholar
[17] Facchinei F, Pang JS (2003) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer, New York).Google Scholar
[18] Ferguson TS (1996) A Course in Large Sample Theory (Routledge).Crossref, Google Scholar
[19] Fisher RA (1922) On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. A 222:594–604.Google Scholar
[20] Fisher RA (1925) Theory of statistical estimation. Math. Proc. Cambridge Philos. Soc. 22:700–725.Crossref, Google Scholar
[21] Geyer CJ (1994) On the asymptotics of constrained M-estimation. Ann. Statist. 22(4):1993–2010.Crossref, Google Scholar
[22] Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. Proc. 14th Internat. Conf. Artificial Intelligence Statist. 315–323.Google Scholar
[23] Gürkan G, Yonca Özge A, Robinson S (1999) Sample-path solution of stochastic variational inequalities. Math. Programming 84:313–333.Crossref, Google Scholar
[24] Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. Le Cam LM, Neyman J, eds. Proc. Fifth Berkeley Sympos. Math. Statist. Probab., vol. 1 (University of California Press), 221–233.Google Scholar
[25] Jin C, Liu L, Ge R, Jordan M (2018) On the local minima of the empirical risk. Proc. Neural Inform. Processing Systems (NIPS).Google Scholar
[26] King AJ (1993) Asymptotic behaviour of solutions in stochastic optimization: nonsmooth analysis and the derivation of non-normal limit distributions. Unpublished PhD dissertation, University of Washington, Seattle.Google Scholar
[27] King A, Rockafellar RT (1993) Asymptotic theory for solutions in statistical estimation and stochastic programming. Math. Oper. Res. 18(1):148–162.Link, Google Scholar
[28] LeCam L (1970) On the assumptions used to prove asymptotic normality of maximum likelihood estimates. Ann. Math. Statist. 41(3):802–828.Crossref, Google Scholar
[29] Le Thi HA, Pham DT (2005) The DC programming and DCA revised with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 133:25–46.Google Scholar
[30] Loh PL (2017) Statistical consistency and asymptotic normality for high-dimensional robust M-estimators. Ann. Statist. 45(2):866–896.Crossref, Google Scholar
[31] Loh PL, Wainwright M (2012) High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40(3):1637–1664.Crossref, Google Scholar
[32] Loh PL, Wainwright M (2015) Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Machine Learn. Res. 16(1):559–616.Google Scholar
[33] Lu Z, Zhou Z, Sun Z (2019) Enhanced proximal DC algorithms with extrapolation for a class of structured nonsmooth DC minimization. Math. Programming 176:369–401.Crossref, Google Scholar
[34] Luedtke AR, Van Der Laan MJ (2016) Statistical inference for the mean outcome under a possibly non-unique optimal treatment strategy. Ann. Statist. 44(2):713–742.Crossref, Google Scholar
[35] Ma J, Xu J, Maleki A (2019) Optimization-based AMP for phase retrieval: The impact of initialization and ℓ2 regularization. IEEE Trans. Inform. Theory 65(6):3600–3629.Crossref, Google Scholar
[36] Mei S, Bai Y, Montanari A (2018) The landscape of empirical risk for non-convex losses. Ann. Statist. 46:2747–2774.Crossref, Google Scholar
[37] Molchanov I (2005) Theory of Random Sets, vol. 19, no. 2 (Springer, London).Google Scholar
[38] Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. Proc. 27th Internat. Conf. Machine Learn., 807–814.Google Scholar
[39] Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J. Optim. 19(4):1574–1609.Crossref, Google Scholar
[40] Pang JS, Razaviyayn M, Alvarado A (2016) Computing B-stationary points of nonsmooth DC programs. Math. Oper. Res. 42(1):95–118.Link, Google Scholar
[41] Polyak B (1990) New stochastic approximation type procedures. Avtomatica i Telemekhanika 51(7):98–107.Google Scholar
[42] Polyak B, Juditsky A (1992) Acceleration of stochastic approximation by averaging. SIAM J. Control Optim. 30(4):838–855.Crossref, Google Scholar
[43] Robbins H, Monro S (1951) A stochastic approximation method. Ann. Math. Statist. 22(3):400–407.Crossref, Google Scholar
[44] Rockafellar RT (1970) Convex Analysis (Princeton University Press, Princeton).Crossref, Google Scholar
[45] Rockafellar RT, Wets RJ-B (1998) Variational Analysis (Springer, New York).Crossref, Google Scholar
[46] Royset J (2020) Approximations of semicontinuous functions with applications to stochastic optimization and statistical estimation. Math. Programming Ser. A 184:289–318.Crossref, Google Scholar
[47] Royset J, Wets RJ-B (2020) Variational analysis of constrained M-estimators. Ann. Statist. 48:2759–2790.Crossref, Google Scholar
[48] Scholtes S (2002) Introduction to Piecewise Differentiable Equations (Springer).Google Scholar
[49] Shapiro A (1989) Asymptotic properties of statistical estimators in stochastic programming. Ann. Statist. 17(2):841–858.Crossref, Google Scholar
[50] Shapiro A (2003) Monte Carlo sampling methods. Rusczyński A, Shapiro A, eds. Stochastic Programming, Handbooks in OR & MS, vol. 10 (NorthHolland Publishing Company, Amsterdam), 353–425.Crossref, Google Scholar
[51] Shapiro A, Xu H (2007) Uniform laws of large numbers for set-valued mappings and subdifferentials of random functions. J. Math. Anal. Appl. 325(2):1390–1399.Crossref, Google Scholar
[52] Shapiro A, Dentcheva D, Ruszczyński A (2009) Lectures on Stochastic Programming: Modeling and Theory (SIAM Publications, Philadelphia).Crossref, Google Scholar
[53] Shechtman Y, Eldar Y, Cohen O, Chapman H, Miao J, Segev M (2015) Phase retrieval with application to optical imaging: A contemporary overview. IEEE Signal Processing Magazine 32(3):87–109.Crossref, Google Scholar
[54] Sun J, Qu Q, Wright J (2018) A geometric analysis of phase retrieval. Foundations Comput. Math. 18:1131–1198.Crossref, Google Scholar
[55] van de Geer S (2000) Empirical Processes in M-estimation, vol. 6 (Cambridge University Press).Google Scholar
[56] van der Vaart AW (1998) Asymptotic Statistics, vol. 3 (Cambridge University Press).Crossref, Google Scholar
[57] van der Vaart AW, Wellner J (1996) Weak Convergence and Empirical Processes (Springer).Crossref, Google Scholar
[58] Xu H (2010) Sample average approximation methods for a class of stochastic variational inequality problems. Asia-Pacific J. Oper. Res. 27(1):103–119.Crossref, Google Scholar
[59] Xu H, Zhang D (2009) Smooth sample average approximation of stationary points in nonsmooth stochastic optimization and applications. Math. Programming 119:371–401.Crossref, Google Scholar
[60] Wald A (1949) Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20(4):595–601.Crossref, Google Scholar
[61] Wang G, Giannakis GB, Eldar YC (2018) Solving systems of random quadratic equations via truncated amplitude flow. IEEE Trans. Inform. Theory 64(2):773–794.Crossref, Google Scholar
[62] Wets RJB (1979) A statistical approach to the solution of stochastic programs with (convex) simple recourse. Working paper, University of Kentucky, Lexington.Google Scholar

cover image Mathematics of Operations Research

Volume 47, Issue 3

August 2022

Pages 1707-2545, C2

Article Information

Metrics

Information

Received:February 17, 2020
Accepted:June 30, 2021
Published Online:November 10, 2021

Cite as

Zhengling Qi, Ying Cui, Yufeng Liu, Jong-Shi Pang (2021) Asymptotic Properties of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization. Mathematics of Operations Research 47(3):2034-2064.

https://doi.org/10.1287/moor.2021.1198

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Asymptotic Properties of Stationary Solutions of Coupled Nonconvex Nonsmooth Empirical Risk Minimization

References

Volume 47, Issue 3

Article Information

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News