Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

Published Online:https://doi.org/10.1287/opre.2019.1919

References

  • Beck A, Eldar YC (2013) Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J. Optim. 23(3):1480–1509.Google Scholar
  • Beck A, Tetruashvili L (2013) On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4):2037–2060.Google Scholar
  • Bertsekas D (2016) Nonlinear Programming, 3rd ed. (Athena Scientific, Nashua, NH).Google Scholar
  • Bertsimas D, Van Parys B (2017) Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
  • Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann. Statist. 44(2):813–852.CrossrefGoogle Scholar
  • Blumensath T, Davies M (2009) Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal. 27(3):265–274.CrossrefGoogle Scholar
  • Breheny P, Huang J (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist. 5(1):232–253.CrossrefGoogle Scholar
  • Bühlmann P, van de Geer S (2011) Statistics for High-Dimensional Data (Springer, Berlin).CrossrefGoogle Scholar
  • Erdman C, Bates N (2014) The U.S. Census Bureau mail return rate challenge: Crowdsourcing to develop a hard-to-count score. Statistics Research Report 2014-08, U.S. Census Bureau, Washington, DC.Google Scholar
  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96(456):1348–1360.CrossrefGoogle Scholar
  • Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J. Statist. Software 33(1):1–22.CrossrefGoogle Scholar
  • Gamarnik D, Zadik I (2017) High dimensional regression with binary coefficients. estimating squared error and a phase transition. Kale S, Shamir O, eds. Proc. 2017 Conf. Learn. Theory (PMLR, Amsterdam), 948–953.Google Scholar
  • Greenshtein E (2006) Best subset selection, persistence in high-dimensional statistical learning and optimization under ℓ1 constraint. Ann. Statist. 34(5):2367–2386.CrossrefGoogle Scholar
  • Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J. Environ. Econom. Management 5(1):81–102.Google Scholar
  • Hastie T, Tibshirani R, Tibshirani RJ (2017) Extended comparisons of best subset selection, forward stepwise selection, and the Lasso. Working paper, Stanford University, Stanford, CA.Google Scholar
  • Hastie T, Tibshirani R, Wainwright M (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • He R, McAuley J (2016) Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. Proc. 25th Internat. Conf. World Wide Web (International World Wide Web Conferences Steering Committee, Geneva, Switzerland), 507–517.Google Scholar
  • Lu Z (2014) Iterative hard thresholding methods for l0 regularized convex cone programming. Math. Programming 147(1):125–154.Google Scholar
  • Mazumder R, Radchenko P (2017) The discrete Dantzig selector: Estimating sparse linear models via mixed integer linear optimization. IEEE Trans. Inform. Theory 63(5):3053–3075.Google Scholar
  • Mazumder R, Friedman JH, Hastie T (2011) Sparsenet: Coordinate descent with nonconvex penalties. J. Amer. Statist. Assoc. 106(495):1125–1138.Google Scholar
  • Mazumder R, Radchenko P, Dedieu A (2017) Subset selection with shrinkage: Sparse linear modeling when the SNR is low. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
  • Miller A (2002) Subset Selection in Regression (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Natarajan B (1995) Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2):227–234.CrossrefGoogle Scholar
  • Nesterov Y (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2):341–362.CrossrefGoogle Scholar
  • Patrascu A, Necoara I (2015) Random coordinate descent methods for ℓ0 regularized convex optimization. IEEE Trans. Automatic Control 60(7):1811–1824.Google Scholar
  • Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, et al. Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12(2011):2825–2830.Google Scholar
  • Raskutti G, Wainwright M, Yu B (2011) Minimax rates of estimation for high-dimensional linear regression over ℓq-balls. IEEE Trans. Inform. Theory 57(10):6976–6994.CrossrefGoogle Scholar
  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58(1):267–288.CrossrefGoogle Scholar
  • Tibshirani RJ (2013) The lasso problem and uniqueness. Electronic J. Statist. 7:1456–1490.CrossrefGoogle Scholar
  • Tibshirani R, Bien J, Friedman J, Hastie T, Simon N, Taylor J, Tibshirani RJ (2012) Strong rules for discarding predictors in lasso-type problems. J. Roy. Statist. Soc. Ser. B 74(2):245–266.CrossrefGoogle Scholar
  • Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3):475–494.CrossrefGoogle Scholar
  • Wainwright MJ (2009) Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55(12):5728–5741.CrossrefGoogle Scholar
  • Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38(2):894–942.CrossrefGoogle Scholar
  • Zhang C-H, Zhang T (2012) A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27(4):576–593.CrossrefGoogle Scholar
  • Zhang Y, Wainwright M, Jordan MI (2014) Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. Proc. 2014 Conf. Learn. Theory (PMLR, Amsterdam), 921–948.Google Scholar
  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67(2):301–320.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.