Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms
Published Online:13 Aug 2020https://doi.org/10.1287/opre.2019.1919
References
- (2013) Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J. Optim. 23(3):1480–1509.Google Scholar
- (2013) On the convergence of block coordinate descent type methods. SIAM J. Optim. 23(4):2037–2060.Google Scholar
- (2016) Nonlinear Programming, 3rd ed. (Athena Scientific, Nashua, NH).Google Scholar
- (2017) Sparse high-dimensional regression: Exact scalable algorithms and phase transitions. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
- (2016) Best subset selection via a modern optimization lens. Ann. Statist. 44(2):813–852.Crossref, Google Scholar
- (2009) Iterative hard thresholding for compressed sensing. Appl. Comput. Harmonic Anal. 27(3):265–274.Crossref, Google Scholar
- (2011) Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Statist. 5(1):232–253.Crossref, Google Scholar
- (2011) Statistics for High-Dimensional Data (Springer, Berlin).Crossref, Google Scholar
- (2014) The U.S. Census Bureau mail return rate challenge: Crowdsourcing to develop a hard-to-count score. Statistics Research Report 2014-08, U.S. Census Bureau, Washington, DC.Google Scholar
- (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96(456):1348–1360.Crossref, Google Scholar
- (2010) Regularization paths for generalized linear models via coordinate descent. J. Statist. Software 33(1):1–22.Crossref, Google Scholar
- (2017) High dimensional regression with binary coefficients. estimating squared error and a phase transition. Kale S, Shamir O, eds. Proc. 2017 Conf. Learn. Theory (PMLR, Amsterdam), 948–953.Google Scholar
- (2006) Best subset selection, persistence in high-dimensional statistical learning and optimization under ℓ1 constraint. Ann. Statist. 34(5):2367–2386.Crossref, Google Scholar
- (1978) Hedonic housing prices and the demand for clean air. J. Environ. Econom. Management 5(1):81–102.Google Scholar
- (2017) Extended comparisons of best subset selection, forward stepwise selection, and the Lasso. Working paper, Stanford University, Stanford, CA.Google Scholar
- (2015) Statistical Learning with Sparsity: The Lasso and Generalizations (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- (2016) Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. Proc. 25th Internat. Conf. World Wide Web (International World Wide Web Conferences Steering Committee, Geneva, Switzerland), 507–517.Google Scholar
- (2014) Iterative hard thresholding methods for l0 regularized convex cone programming. Math. Programming 147(1):125–154.Google Scholar
- (2017) The discrete Dantzig selector: Estimating sparse linear models via mixed integer linear optimization. IEEE Trans. Inform. Theory 63(5):3053–3075.Google Scholar
- (2011) Sparsenet: Coordinate descent with nonconvex penalties. J. Amer. Statist. Assoc. 106(495):1125–1138.Google Scholar
- (2017) Subset selection with shrinkage: Sparse linear modeling when the SNR is low. Working paper, Massachusetts Institute of Technology, Cambridge.Google Scholar
- (2002) Subset Selection in Regression (CRC Press, Boca Raton, FL).Crossref, Google Scholar
- (1995) Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2):227–234.Crossref, Google Scholar
- (2012) Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2):341–362.Crossref, Google Scholar
- (2015) Random coordinate descent methods for ℓ0 regularized convex optimization. IEEE Trans. Automatic Control 60(7):1811–1824.Google Scholar
- Scikit-learn: Machine learning in Python. J. Machine Learn. Res. 12(2011):2825–2830.Google Scholar
- (2011) Minimax rates of estimation for high-dimensional linear regression over ℓq-balls. IEEE Trans. Inform. Theory 57(10):6976–6994.Crossref, Google Scholar
- (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58(1):267–288.Crossref, Google Scholar
- (2013) The lasso problem and uniqueness. Electronic J. Statist. 7:1456–1490.Crossref, Google Scholar
- (2012) Strong rules for discarding predictors in lasso-type problems. J. Roy. Statist. Soc. Ser. B 74(2):245–266.Crossref, Google Scholar
- (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109(3):475–494.Crossref, Google Scholar
- (2009) Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55(12):5728–5741.Crossref, Google Scholar
- (2010) Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38(2):894–942.Crossref, Google Scholar
- (2012) A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27(4):576–593.Crossref, Google Scholar
- (2014) Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. Proc. 2014 Conf. Learn. Theory (PMLR, Amsterdam), 921–948.Google Scholar
- (2005) Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67(2):301–320.Crossref, Google Scholar

