Convex and Nonconvex Risk-Based Linear Regression at Scale

Published Online:https://doi.org/10.1287/ijoc.2023.1282

References

  • Alfons A, Croux C, Gelper S (2013) Sparse least trimmed squares regression for analyzing high-dimensional large data sets. Ann. Appl. Statist. 7(1):226–248.CrossrefGoogle Scholar
  • Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 41–48.Google Scholar
  • Bertsekas D, Nedic A, Ozdaglar A (2003) Convex Analysis and Optimization (Athena Scientific, Belmont, MA).Google Scholar
  • Bhatia R (1997) Matrix Analysis (Springer-Verlag, New York).CrossrefGoogle Scholar
  • Brucker P (1984) An O(n) algorithm for quadratic knapsack problems. Oper. Res. Lett. 3(3):163–166.CrossrefGoogle Scholar
  • Cui Y, Ding C, Li XD, Zhao XY (2021) Augmented Lagrangian methods for convex optimization problems. J. Oper. Res. Soc. China 10(2):305–342.CrossrefGoogle Scholar
  • El Ghaoui L, Viallon V, Rabbani T (2012) Safe feature elimination for the lasso and sparse supervised learning problems. Pacific J. Optim. 8(4):667–698.Google Scholar
  • Facchinei F, Pang JS (2007) Finite-Dimensional Variational Inequalities and Complementarity Problems (Springer Science & Business Media, New York).Google Scholar
  • Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J. Roy. Statist. Soc. B 70(5):849–911.CrossrefGoogle Scholar
  • Fan K (1951) Maximum properties and inequalities for the eigenvalues of completely continuous operators. Proc. Natl. Acad. Sci. USA 37(11):760–766.CrossrefGoogle Scholar
  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Held M, Wolfe P, Crowder HP (1974) Validation of subgradient optimization. Math. Programming 6(1):62–88.CrossrefGoogle Scholar
  • Helgason K, Kennington J, Lall H (1980) A polynomially bounded algorithm for a singly constrained quadratic program. Math. Programming 18(1):338–343.CrossrefGoogle Scholar
  • Hiriart-Urruty JB, Strodiot JJ, Nguyen VH (1984) Generalized Hessian matrix and second-order optimality conditions for problems with C1,1 data. Appl. Math. Optim. 11(1):43–56.CrossrefGoogle Scholar
  • Horn RA, Johnson CR (2013) Matrix Analysis, 2nd ed. (Cambridge University Press, Cambridge, UK).Google Scholar
  • Huang L, Jia J, Yu B, Chun BG, Maniatis P, Naik M (2010) Predicting execution time of computer programs using sparse polynomial regression. Lafferty J, Williams C, Shawe-Taylor J, Zemel R, Culotta A, eds. Adv. Neural Inform. Processing Systems, vol. 23 (NeurIPS, San Diego), 883–891.Google Scholar
  • Levy D, Carmon Y, Duchi JC, Sidford A (2020) Large-scale methods for distributionally robust optimization. Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Adv. Neural Inform. Processing Systems, vol. 33 (NeurIPS, San Diego), 8847–8860.Google Scholar
  • Li XD, Sun DF, Toh KC (2018) On efficiently solving the subproblems of a level-set method for fused lasso problems. SIAM J. Optim. 28(2):1842–1866.CrossrefGoogle Scholar
  • Li XD, Sun DF, Toh KC (2020) An asymptotically superlinearly convergent semismooth Newton augmented Lagrangian method for linear programming. SIAM J. Optim. 30(3):2410–2440.CrossrefGoogle Scholar
  • Lin MX, Yuan YC, Sun DF, Toh KC (2020) Adaptive sieving with PPDNA: Generating solution paths of exclusive lasso models. Preprint, submitted September 18, https://arxiv.org/abs/2009.08719.Google Scholar
  • Lyu S, Fan Y, Ying Y, Hu BG (2020) Average top-k aggregate loss for supervised learning. IEEE Trans. Pattern Anal. Machine Intelligence 44(1):76–86.CrossrefGoogle Scholar
  • Ndiaye E, Fercoq O, Salmon J (2021) Screening rules and its complexity for active set identification. J. Convex Anal. 28(4):1053–1072.Google Scholar
  • Ndiaye E, Fercoq O, Gramfort A, Salmon J (2015) Gap safe screening rules for sparse multi-task and multi-class models. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Adv. Neural Inform. Processing Systems, vol. 28 (NeurIPS, San Diego), 811–819.Google Scholar
  • Ortega JM, Rheinboldt WC (2000) Iterative Solution of Nonlinear Equations in Several Variables, Classics in Applied Mathematics, vol. 30 (SIAM, Philadelphia).CrossrefGoogle Scholar
  • Overton ML, Womersley RS (1993) Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices. Math. Programming 62(1):321–357.CrossrefGoogle Scholar
  • Pardalos PM, Kovoor N (1990) An algorithm for a singly constrained class of quadratic programs subject to upper and lower bounds. Math. Programming 46(1):321–328.CrossrefGoogle Scholar
  • Rockafellar RT (1976) Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1(2):97–116.LinkGoogle Scholar
  • Rockafellar RT, Uryasev S (2000) Optimization of conditional value-at-risk. J. Risk 2(3):21–41.CrossrefGoogle Scholar
  • Rockafellar RT, Wets RJB (1998) Variational Analysis (Springer-Verlag, Berlin).CrossrefGoogle Scholar
  • Rousseeuw PJ (1984) Least median of squares regression. J. Amer. Statist. Assoc. 79(388):871–880.CrossrefGoogle Scholar
  • Shibagaki A, Karasuyama M, Hatano K, Takeuchi I (2016) Simultaneous safe screening of features and samples in doubly sparse modeling. Balcan MF, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learn., vol. 48 (ICML, San Diego), 1577–1586.Google Scholar
  • Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. Proc. IEEE Conf. Comput. Vision Pattern Recognition, 761–769.Google Scholar
  • Sun J (1986) On monotropic piecewise quadratic programming. Unpublished PhD thesis, Department of Mathematics, University of Washington, Seattle.Google Scholar
  • Tang P, Wang C, Sun D, Toh KC (2020) A sparse semismooth Newton based proximal majorization-minimization algorithm for nonconvex square-root-loss regression problems. J. Machine Learn. Res. 21(226):1–38.Google Scholar
  • Tao PD, An LTH (1997) Convex analysis approach to DC programming: Theory, algorithms and applications. Acta Mathematica Vietnamica 22(1):289–355.Google Scholar
  • Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, Tibshirani RJ (2012) Strong rules for discarding predictors in lasso-type problems. J. Roy. Statist. Soc. B 74(2):245–266.CrossrefGoogle Scholar
  • Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-lasso. J. Bus. Econom. Statist. 25(3):347–355.CrossrefGoogle Scholar
  • Wang J, Zhou J, Liu J, Wonka P, Ye J (2014) A safe screening rule for sparse logistic regression. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, vol. 27 (NeurIPS, San Diego), 1053–1061.Google Scholar
  • Watson GA (1992) Linear best approximation using a class of polyhedral norms. Numer. Algorithms 2(3):321–336.CrossrefGoogle Scholar
  • Wu B, Ding C, Sun DF, Toh KC (2014) On the Moreau-Yosida regularization of the vector k-norm related functions. SIAM J. Optim. 24(2):766–794.CrossrefGoogle Scholar
  • Wu C, Cui Y, Li D, Sun D (2022) Convex and nonconvex risk-based linear regression at scale. URL http://dx.doi.org/10.5281/zenodo.7483279, available at https://github.com/INFORMSJoC/2022.0012.Google Scholar
  • Xu H, Caramanis C, Mannor S (2010) Robust regression and lasso. IEEE Trans. Inform. Theory 56(7):3561–3574.CrossrefGoogle Scholar
  • Zhao XY, Sun DF, Toh KC (2010) A Newton-CG augmented Lagrangian method for semidefinite programming. SIAM J. Optim. 20(4):1737–1765.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.