Convex and Nonconvex Risk-Based Linear Regression at Scale

Can Wu
Can Wu
[email protected]
School of Mathematical Sciences, South China Normal University, Guangzhou, 510631, China;Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong;
Search for more papers by this author
,
Ying Cui
Corresponding Author
Ying Cui
[email protected]
https://orcid.org/0000-0003-4173-5647
Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455
Search for more papers by this author
,
Donghui Li
Donghui Li
[email protected]
School of Mathematical Sciences, South China Normal University, Guangzhou, 510631, China;
Search for more papers by this author
,
Defeng Sun
Defeng Sun
[email protected]
Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong;
Search for more papers by this author

School of Mathematical Sciences, South China Normal University, Guangzhou, 510631, China;Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong;

Search for more papers by this author

Ying Cui

Corresponding Author

Ying Cui

[email protected]

https://orcid.org/0000-0003-4173-5647

Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455

Search for more papers by this author

Donghui Li

[email protected]

School of Mathematical Sciences, South China Normal University, Guangzhou, 510631, China;

Search for more papers by this author

Defeng Sun

[email protected]

Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Hong Kong;

Search for more papers by this author

Published Online:11 Apr 2023https://doi.org/10.1287/ijoc.2023.1282

References

Alfons A, Croux C, Gelper S (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Statist. 7(1):226–248.Crossref, Google Scholar
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 41–48.Google Scholar
Bertsekas D, Nedic A, Ozdaglar A (2003) Convex Analysis and Optimization (Athena Scientific, Belmont, MA).Google Scholar
Bhatia R (1997) Matrix Analysis (Springer-Verlag, New York).Crossref, Google Scholar
Brucker P (1984) An O(n) algorithm for quadratic knapsack problems. Oper. Res. Lett. 3(3):163–166.Crossref, Google Scholar
Cui Y, Ding C, Li XD, Zhao XY (2021) Augmented Lagrangian methods for convex optimization problems. J. Oper. Res. Soc. China 10(2):305–342.Crossref, Google Scholar
El Ghaoui L, Viallon V, Rabbani T (2012) Safe feature elimination for the lasso and sparse supervised learning problems. Pacific J. Optim. 8(4):667–698.Google Scholar
Facchinei F, Pang JS (2007) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer Science & Business Media, New York).Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J. Roy. Statist. Soc. B 70(5):849–911.Crossref, Google Scholar
Fan K (1951) Maximum properties and inequalities for the eigenvalues of completely continuous operators. Proc. Natl. Acad. Sci. USA 37(11):760–766.Crossref, Google Scholar
Hastie T, Tibshirani R, Wainwright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (CRC Press, Boca Raton, FL).Crossref, Google Scholar
Held M, Wolfe P, Crowder HP (1974) Validation of subgradient optimization. Math. Programming 6(1):62–88.Crossref, Google Scholar
Helgason K, Kennington J, Lall H (1980) A polynomially bounded algorithm for a singly constrained quadratic program. Math. Programming 18(1):338–343.Crossref, Google Scholar
Hiriart-Urruty JB, Strodiot JJ, Nguyen VH (1984) Generalized Hessian matrix and second-order optimality conditions for problems with C1,1 data. Appl. Math. Optim. 11(1):43–56.Crossref, Google Scholar
Horn RA, Johnson CR (2013) Matrix Analysis, 2nd ed. (Cambridge University Press, Cambridge, UK).Google Scholar
Huang L, Jia J, Yu B, Chun BG, Maniatis P, Naik M (2010) Predicting execution time of computer programs using sparse polynomial regression. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 23 (NeurIPS, San Diego), 883–891.Google Scholar
Levy D, Carmon Y, Duchi JC, Sidford A (2020) Large-scale methods for distributionally robust optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (NeurIPS, San Diego), 8847–8860.Google Scholar
Li XD, Sun DF, Toh KC (2018) On efficiently solving the subproblems of a level-set method for fused lasso problems. SIAM J. Optim. 28(2):1842–1866.Crossref, Google Scholar
Li XD, Sun DF, Toh KC (2020) An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for linear programming. SIAM J. Optim. 30(3):2410–2440.Crossref, Google Scholar
Lin MX, Yuan YC, Sun DF, Toh KC (2020) Adaptive sieving with PPDNA: Generating solution paths of exclusive lasso models. Preprint, submitted September 18, https://arxiv.org/abs/2009.08719.Google Scholar
Lyu S, Fan Y, Ying Y, Hu BG (2020) Average top-k aggregate loss for supervised learning. IEEE Trans. Pattern Anal. Machine Intelligence 44(1):76–86.Crossref, Google Scholar
Ndiaye E, Fercoq O, Salmon J (2021) Screening rules and its complexity for active set identification. J. Convex Anal. 28(4):1053–1072.Google Scholar
Ndiaye E, Fercoq O, Gramfort A, Salmon J (2015) Gap safe screening rules for sparse multi-task and multi-class models. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (NeurIPS, San Diego), 811–819.Google Scholar
Ortega JM, Rheinboldt WC (2000) Iterative Solution of Nonlinear Equations in Several Variables, Classics in Applied Mathematics, vol. 30 (SIAM, Philadelphia).Crossref, Google Scholar
Overton ML, Womersley RS (1993) Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Programming 62(1):321–357.Crossref, Google Scholar
Pardalos PM, Kovoor N (1990) An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds. Math. Programming 46(1):321–328.Crossref, Google Scholar
Rockafellar RT (1976) Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2):97–116.Link, Google Scholar
Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–41.Crossref, Google Scholar
Rockafellar RT, Wets RJB (1998) Variational Analysis (Springer-Verlag, Berlin).Crossref, Google Scholar
Rousseeuw PJ (1984) Least median of squares regression. J. Amer. Statist. Assoc. 79(388):871–880.Crossref, Google Scholar
Shibagaki A, Karasuyama M, Hatano K, Takeuchi I (2016) Simultaneous safe screening of features and samples in doubly sparse modeling. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (ICML, San Diego), 1577–1586.Google Scholar
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. Proc. IEEE Conf. Comput. Vision Pattern Recognition, 761–769.Google Scholar
Sun J (1986) On monotropic piecewise quadratic programming. Unpublished PhD thesis, Department of Mathematics, University of Washington, Seattle.Google Scholar
Tang P, Wang C, Sun D, Toh KC (2020) A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems. J. Machine Learn. Res. 21(226):1–38.Google Scholar
Tao PD, An LTH (1997) Convex analysis approach to DC programming: Theory, algorithms and applications. Acta Mathematica Vietnamica 22(1):289–355.Google Scholar
Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, Tibshirani RJ (2012) Strong rules for discarding predictors in lasso-type problems. J. Roy. Statist. Soc. B 74(2):245–266.Crossref, Google Scholar
Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-lasso. J. Bus. Econom. Statist. 25(3):347–355.Crossref, Google Scholar
Wang J, Zhou J, Liu J, Wonka P, Ye J (2014) A safe screening rule for sparse logistic regression. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 27 (NeurIPS, San Diego), 1053–1061.Google Scholar
Watson GA (1992) Linear best approximation using a class of polyhedral norms. Numer. Algorithms 2(3):321–336.Crossref, Google Scholar
Wu B, Ding C, Sun DF, Toh KC (2014) On the Moreau-Yosida regularization of the vector k-norm related functions. SIAM J. Optim. 24(2):766–794.Crossref, Google Scholar
Wu C, Cui Y, Li D, Sun D (2022) Convex and nonconvex risk-based linear regression at scale. URL http://dx.doi.org/10.5281/zenodo.7483279, available at https://github.com/INFORMSJoC/2022.0012.Google Scholar
Xu H, Caramanis C, Mannor S (2010) Robust regression and lasso. IEEE Trans. Inform. Theory 56(7):3561–3574.Crossref, Google Scholar
Zhao XY, Sun DF, Toh KC (2010) A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4):1737–1765.Crossref, Google Scholar

cover image INFORMS Journal on Computing

Volume 35, Issue 4

July-August 2023

Pages 711-908, C2

Article Information

Supplemental Material

Metrics

Information

Received:January 09, 2022
Accepted:January 23, 2023
Published Online:April 11, 2023

Cite as

Can Wu, Ying Cui, Donghui Li, Defeng Sun (2023) Convex and Nonconvex Risk-Based Linear Regression at Scale. INFORMS Journal on Computing 35(4):797-816.

https://doi.org/10.1287/ijoc.2023.1282

Keywords

Acknowledgments

The authors thank anonymous reviewers and the associate editor for the valuable comments and suggestions that helped improve the paper and software greatly.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Convex and Nonconvex Risk-Based Linear Regression at Scale

References

Volume 35, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News