Convex Optimization for Group Feature Selection in Networked Data

Published Online:https://doi.org/10.1287/ijoc.2018.0868

References

  • Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoret. Comput. Sci. 209(1):237–260.CrossrefGoogle Scholar
  • Assaf M, Jagannathan K, Calhoun VD, Miller L, Stevens MC, Sahl R, O’Boyle JG, Schultz RT, Pearlson GD (2010) Abnormal functional connectivity of default mode sub-networks in autism spectrum disorder patients. NeuroImage 53(1):247–256.CrossrefGoogle Scholar
  • Bach F, Jenatton R, Mairal J, Obozinski G (2011) Convex optimization with sparsity-inducing norms. Sra S, Nowozin S, Wright SJ, eds. Optimization for Machine Learning (MIT Press, Cambridge, MA), 19–53.CrossrefGoogle Scholar
  • Bach F, Obozinski G (2010) Sparse methods for machine learning theory and algorithms. ECML/PKDD Tutorial, https://www.di.ens.fr/∼fbach/ecml2010tutorial/.Google Scholar
  • Bertolazzi P, Felici G, Festa P, Fiscon G, Weitschek E (2016) Integer programming models for feature selection: New extensions and a randomized solution algorithm. Eur. J. Oper. Res. 250(2):389–399.CrossrefGoogle Scholar
  • Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann. Statist. 44(2):813–852.CrossrefGoogle Scholar
  • Bhaskara A, Charikar M, Chlamtac E, Feige U, Vijayaraghavan A (2010) Detecting high log-densities: An O(n14) approximation for densest k-subgraph. Mitzenmacher M, Schulman LJ, eds. Proc. 42nd ACM Sympos. Theory Comput. (ACM, New York), 201–210.CrossrefGoogle Scholar
  • Borgwardt S, Schmiedl F (2014) Threshold-based preprocessing for approximating the weighted dense k-subgraph problem. Eur. J. Oper. Res. 234(3):631–640.CrossrefGoogle Scholar
  • Bounova G, de Weck O (2012) Overview of metrics and their correlation patterns for multiple-metric topology analysis on heterogeneous graph ensembles. Physical Rev. E 85(1):016117.CrossrefGoogle Scholar
  • Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. Shavlik JW, ed. Proc. 15th Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers, San Francisco), 82–90.Google Scholar
  • Chan AB, Vasconcelos N, Lanckriet GR (2007) Direct convex relaxations of sparse SVM. Ghahramani Z. ed. Proc. 24th Internat. Conf. Machine Learn., 2007 Internat. Conf. Inductive Logical Programming, Corvallis, OR, 145–153.CrossrefGoogle Scholar
  • Chang CY (1972) Dynamic programming as applied to feature subset selection in a pattern recognition system. Donovan JJ, ed. Proc. ACM Annual Conf., vol. 1 (ACM, New York), 94–103.CrossrefGoogle Scholar
  • Chaplot S, Patnaik L, Jagannathan N (2006) Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomedical Signal Process. Control 1(1):86–92.CrossrefGoogle Scholar
  • Chen CP, Keown CL, Jahedi A, Nair A, Pflieger ME, Bailey BA, Müller RA (2015) Diagnostic classification of intrinsic functional connectivity highlights somatosensory, default mode, and visual regions in autism. NeuroImage: Clin. 8:238–245.CrossrefGoogle Scholar
  • Chen J, Ye J (2008) Training svm with indefinite kernels. McCallum A, Roweis S, eds. Proc. 25th Internat. Conf. Machine Learn. (Omin Press, Madison, WI), 136–143.CrossrefGoogle Scholar
  • Corneil DG, Perl Y (1984) Clustering and domination in perfect graphs. Discrete Appl. Math. 9(1):27–39.CrossrefGoogle Scholar
  • Craddock RC, Holtzheimer PE, Hu XP, Mayberg HS (2009) Disease state prediction from resting state functional connectivity. Magnetic Resonance Medicine 62(6):1619–1628.CrossrefGoogle Scholar
  • Erdös P, Rényi A (1959) On Random Graphs, vol. 6 (Publicationes Mathematicae, Debrecen, Hungary), 290–297.Google Scholar
  • Fang SC, Lin CJ, Wu SY (2001) Solving quadratic semi-infinite programming problems by using relaxed cutting-plane scheme. J. Comput. Appl. Math. 129(1):89–104.CrossrefGoogle Scholar
  • Fung G, Mangasarian OL (2000) Data selection for support vector machine classifiers. Ramakrishnan R, ed. Proc. 6th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 64–70.CrossrefGoogle Scholar
  • Guan W, Gray A, Leyffer S (2009) Mixed-integer support vector machine. LeCun Y, Zemel R, eds. NIPS Workshop on Optimization for Machine Learning (Curran Associates, Red Hook, NY).Google Scholar
  • Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J. Machine Learn. Res. 3(March):1157–1182.Google Scholar
  • Han J, Sun Z, Hao H (2015) l 0-norm based structural sparse least square regression for feature selection. Pattern Recognition 48(12):3927–3940.CrossrefGoogle Scholar
  • Hettich R, Kortanek KO (1993) Semi-infinite programming: Theory, methods, and applications. SIAM Rev. 35(3):380–429.CrossrefGoogle Scholar
  • Hull JV, Jacokes ZJ, Torgerson CM, Irimia A, Van Horn JD (2016) Resting-state functional connectivity in autism spectrum disorders: A review. Frontiers Psychiatry 7:205.CrossrefGoogle Scholar
  • Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. Jones B, ed. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 433–440.CrossrefGoogle Scholar
  • Jenatton R, Audibert JY, Bach F (2011) Structured variable selection with sparsity-inducing norms. J. Machine Learn. Res. 12:2777–2824.Google Scholar
  • Kim S, Xing EP (2012) Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. Ann. Appl. Statist. 6(3):1095–1117.CrossrefGoogle Scholar
  • Kim SJ, Boyd S (2008) A minimax theorem with applications to machine learning, signal processing, and finance. SIAM J. Optim. 19(3):1344–1367.CrossrefGoogle Scholar
  • Klöppel S, Abdulkadir A, Jack CR, Koutsouleris N, Mourão-Miranda J, Vemuri P (2012) Diagnostic neuroimaging across diseases. NeuroImage 61(2):457–463.CrossrefGoogle Scholar
  • Lanckriet GR, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J. Machine Learn. Res. 5:27–72.Google Scholar
  • Li YF, Kwok JT, Tsang IW, Zhou ZH (2009a) A convex method for locating regions of interest with multi-instance learning. Altun Y, Das K, Mielikäinen T, Malerba D, Stefanowski J, Read J, Žitnik M, Ceci M, Džeroski S, eds. Machine Learning and Knowledge Discovery in Databases (Springer, New York), 15–30.CrossrefGoogle Scholar
  • Li YF, Tsang IW, Kwok JT, Zhou ZH (2009b) Tighter and convex maximum margin clustering. van Dyk D, Welling M, eds. Internat. Conf. Artificial Intelligence Statistics (Microtome, Brookline, MA), 344–351.Google Scholar
  • Lin D, Pitler E, Foster DP, Ungar LH (2008) In defense of ℓ0. McCallum A, Roweis S, eds. Proc. 25th Internat. Conf. Machine Learn. (Omni Press, Madison, WI).Google Scholar
  • Liu J, Ji S, Ye J (2009) SLEP: Sparse Learning with Efficient Projections (Arizona State University, Phoenix).Google Scholar
  • Lu Z, Zhang Y (2010) Penalty decomposition methods for ℓ0-norm minimization. Technical report, Simon Fraser University, BC, Canada.Google Scholar
  • Macambira EM, De Souza CC (2000) The edge-weighted clique problem: Valid inequalities, facets and polyhedral computations. Eur. J. Oper. Res. 123(2):346–371.CrossrefGoogle Scholar
  • Maldonado S, Pérez J, Weber R, Labbé M (2014) Feature selection for support vector machines via mixed integer linear programming. Inform. Sci. 279:163–175.CrossrefGoogle Scholar
  • McAuley J, Ming J, Stewart D, Hanna P (2005) Subband correlation and robust speech recognition. IEEE Trans. Speech Audio Process. 13(5):956–964.CrossrefGoogle Scholar
  • Mourão-Miranda J, Bokde AL, Born C, Hampel H, Stetter M (2005) Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data. NeuroImage 28(4):980–995.CrossrefGoogle Scholar
  • Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2):227–234.CrossrefGoogle Scholar
  • Ng AY (2004) Feature selection, ℓ1vs. ℓ2regularization, and rotational invariance. Brodley CE, ed. Proc. 21st Internat. Conf. Machine Learn., vol. 78 (ACM, New York).CrossrefGoogle Scholar
  • Pee E, Royset JO (2011) On solving large-scale finite minimax problems using exponential smoothing. J. Optim. Theory Appl. 148(2):390–421.CrossrefGoogle Scholar
  • Power JD, Cohen AL, Nelson SM, Wig GS, Barnes KA, Church JA, Vogel AC, Laumann TO, Miezin FM, Schlaggar BL (2011) Functional network organization of the human brain. Neuron 72(4):665–678.CrossrefGoogle Scholar
  • Rakotomamonjy A (2003) Variable selection using svm based criteria. J. Machine Learn. Res. 3:1357–1370.Google Scholar
  • Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. Ghahramani Z, ed. Proc. 24th Internat. Conf. Machine Learn. (Omni Press, Madison, WI), 775–782.CrossrefGoogle Scholar
  • Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008) SimpleMKL. J. Machine Learn. Res. 9:2491–2521.Google Scholar
  • Rao N, Cox C, Nowak R, Rogers TT (2013) Sparse overlapping sets lasso for multitask learning and its application to fMRI analysis. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Red Hook, NY), 2202–2210.Google Scholar
  • Rosa MJ, Portugal L, Hahn T, Fallgatter AJ, Garrido MI, Shawe-Taylor J, Mourao-Miranda J (2015) Sparse network-based models for patient classification using fMRI. NeuroImage 105(15):493–506.CrossrefGoogle Scholar
  • Sion M (1958) On general minimax theorems. Pacific J. Math. 8(1):171–176.CrossrefGoogle Scholar
  • Sporns O (2011) Networks of the Brain (MIT Press, Cambridge).Google Scholar
  • Sun Z, Ampornpunt N, Varma M, Vishwanathan S (2010) Multiple kernel learning and the SMO algorithm. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Red Hook, NY), 2361–2369.Google Scholar
  • Tan M, Wang L, Tsang IW (2010) Learning sparse svm for feature selection on very high dimensional datasets. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Machine Learn. (ICML-10) (Omni Press, Madison, WI), 1047–1054.Google Scholar
  • Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. Aggaarwal CC, ed. Data Classification: Algorithms and Applications, vol. 37 (CRC Press, Boca Raton, FL).Google Scholar
  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B: Methodology 58(1):267–288.Google Scholar
  • Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J. Royal Statist. Soc.: Ser. B: Statist. Methodology 67(1):91–108.CrossrefGoogle Scholar
  • Wang H, Leng C (2008) A note on adaptive group lasso. Comput. Statist. Data Anal. 52(12):5277–5286.CrossrefGoogle Scholar
  • Wang L, Shen X (2007) On ℓ1-norm multiclass support vector machines. J. Amer. Statist. Assoc. 102(478):583–594.CrossrefGoogle Scholar
  • Weng SJ, Wiggins JL, Peltier SJ, Carrasco M, Risi S, Lord C, Monk CS (2010) Alterations of resting state functional connectivity in the default network in adolescents with autism spectrum disorders. Brain Res. 1313:202–214.CrossrefGoogle Scholar
  • Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J. Machine Learn. Res. 3:1439–1461.Google Scholar
  • Won D (2016) Optimization and machine learning frameworks for complex network analysis. PhD thesis, University of Washington, Seattle.Google Scholar
  • Xie J, Zeng L (2010) Group variable selection methods and their applications in analysis of genomic data. Feng J, Fu W, Sun F, eds. Frontiers Computational and Systems Biology (Springer, New York), 231–248.CrossrefGoogle Scholar
  • Xu Z, Jin R, Ye J, Lyu MR, King I (2009) Non-monotonic feature selection. Leon Bottou L, Littman M, eds. Proc. 26th Annual Internat. Conf. Machine Learn. (Omni Press, Madison, WI), 1145–1152.CrossrefGoogle Scholar
  • Yang C, Wan X, Yang Q, Xue H, Yu W (2010) Identifying main effects and epistatic interactions from large-scale snp data via adaptive group lasso. BMC Bioinform. 11(1):S18.CrossrefGoogle Scholar
  • Yeo BT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, Roffman JL, et al.. (2011) The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106(3):1125–1165.CrossrefGoogle Scholar
  • Yuan GX, Chang KW, Hsieh CJ, Lin CJ (2010) A comparison of optimization methods and software for large-scale l1-regularized linear classification. J. Machine Learn. Res. 11(1–4):3183–3234.Google Scholar
  • Yuan L, Liu J, Ye J (2011) Efficient methods for overlapping group lasso. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 352–360.Google Scholar
  • Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J. Royal Statist. Soc.: Ser. B: Statist. Methodology 68(1):49–67.CrossrefGoogle Scholar
  • Zeng LL, Shen H, Liu L, Wang L, Li B, Fang P, Zhou Z, Li Y, Hu D (2012) Identifying major depression using whole-brain functional connectivity: A multivariate pattern analysis. Brain 135(5):1498–1507.CrossrefGoogle Scholar
  • Zhang CH, Huang J (2008) The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Statist. 36(4)1567–1594.CrossrefGoogle Scholar
  • Zhang L, Wu SY, López MA (2010) A new exchange method for convex semi-infinite programming. SIAM J. Optim. 20(6):2959–2977.CrossrefGoogle Scholar
  • Zou H (2006) The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101(476):1418–1429.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.