Convex Optimization for Group Feature Selection in Networked Data

Daehan Won
Corresponding Author
Daehan Won
Systems Science and Industrial Engineering Department, Binghamton University, the State University of New York, New York, New York 13902;
Search for more papers by this author
,
Hasan Manzour
Corresponding Author
Hasan Manzour
Department of Industrial and Systems Engineering, University of Washington, Seattle, Washington 98195;
Search for more papers by this author
,
Wanpracha Chaovalitwongse
Wanpracha Chaovalitwongse
http://orcid.org/0000-0002-8051-5981
Institute for Advanced Data Analytics, Department of Industrial Engineering, University of Arkansas, Fayetteville, Arkansas 72701
Search for more papers by this author

Corresponding Author

Daehan Won

Systems Science and Industrial Engineering Department, Binghamton University, the State University of New York, New York, New York 13902;

Search for more papers by this author

Hasan Manzour

Corresponding Author

Hasan Manzour

Department of Industrial and Systems Engineering, University of Washington, Seattle, Washington 98195;

Search for more papers by this author

Wanpracha Chaovalitwongse

http://orcid.org/0000-0002-8051-5981

Institute for Advanced Data Analytics, Department of Industrial Engineering, University of Arkansas, Fayetteville, Arkansas 72701

Search for more papers by this author

Published Online:18 Jul 2019https://doi.org/10.1287/ijoc.2018.0868

References

Amaldi E, Kann V (1998) On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems. Theoret. Comput. Sci. 209(1):237–260.Crossref, Google Scholar
Assaf M, Jagannathan K, Calhoun VD, Miller L, Stevens MC, Sahl R, O’Boyle JG, Schultz RT, Pearlson GD (2010) Abnormal functional connectivity of default mode sub-networks in autism spectrum disorder patients. NeuroImage 53(1):247–256.Crossref, Google Scholar
Bach F, Jenatton R, Mairal J, Obozinski G (2011) Convex optimization with sparsity-inducing norms. Sra S, Nowozin S, Wright SJ, eds. Optimization for Machine Learning (MIT Press, Cambridge, MA), 19–53.Crossref, Google Scholar
Bach F, Obozinski G (2010) Sparse methods for machine learning theory and algorithms. ECML/PKDD Tutorial, https://www.di.ens.fr/∼fbach/ecml2010tutorial/.Google Scholar
Bertolazzi P, Felici G, Festa P, Fiscon G, Weitschek E (2016) Integer programming models for feature selection: New extensions and a randomized solution algorithm. Eur. J. Oper. Res. 250(2):389–399.Crossref, Google Scholar
Bertsimas D, King A, Mazumder R (2016) Best subset selection via a modern optimization lens. Ann. Statist. 44(2):813–852.Crossref, Google Scholar
Bhaskara A, Charikar M, Chlamtac E, Feige U, Vijayaraghavan A (2010) Detecting high log-densities: An O(n14) approximation for densest k-subgraph. Mitzenmacher M, Schulman LJ, eds. Proc. 42nd ACM Sympos. Theory Comput. (ACM, New York), 201–210.Crossref, Google Scholar
Borgwardt S, Schmiedl F (2014) Threshold-based preprocessing for approximating the weighted dense k-subgraph problem. Eur. J. Oper. Res. 234(3):631–640.Crossref, Google Scholar
Bounova G, de Weck O (2012) Overview of metrics and their correlation patterns for multiple-metric topology analysis on heterogeneous graph ensembles. Physical Rev. E 85(1):016117.Crossref, Google Scholar
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. Shavlik JW, ed. Proc. 15th Internat. Conf. Machine Learn. (Morgan Kaufmann Publishers, San Francisco), 82–90.Google Scholar
Chan AB, Vasconcelos N, Lanckriet GR (2007) Direct convex relaxations of sparse SVM. Ghahramani Z. ed. Proc. 24th Internat. Conf. Machine Learn., 2007 Internat. Conf. Inductive Logical Programming, Corvallis, OR, 145–153.Crossref, Google Scholar
Chang CY (1972) Dynamic programming as applied to feature subset selection in a pattern recognition system. Donovan JJ, ed. Proc. ACM Annual Conf., vol. 1 (ACM, New York), 94–103.Crossref, Google Scholar
Chaplot S, Patnaik L, Jagannathan N (2006) Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomedical Signal Process. Control 1(1):86–92.Crossref, Google Scholar
Chen CP, Keown CL, Jahedi A, Nair A, Pflieger ME, Bailey BA, Müller RA (2015) Diagnostic classification of intrinsic functional connectivity highlights somatosensory, default mode, and visual regions in autism. NeuroImage: Clin. 8:238–245.Crossref, Google Scholar
Chen J, Ye J (2008) Training svm with indefinite kernels. McCallum A, Roweis S, eds. Proc. 25th Internat. Conf. Machine Learn. (Omin Press, Madison, WI), 136–143.Crossref, Google Scholar
Corneil DG, Perl Y (1984) Clustering and domination in perfect graphs. Discrete Appl. Math. 9(1):27–39.Crossref, Google Scholar
Craddock RC, Holtzheimer PE, Hu XP, Mayberg HS (2009) Disease state prediction from resting state functional connectivity. Magnetic Resonance Medicine 62(6):1619–1628.Crossref, Google Scholar
Erdös P, Rényi A (1959) On Random Graphs, vol. 6 (Publicationes Mathematicae, Debrecen, Hungary), 290–297.Google Scholar
Fang SC, Lin CJ, Wu SY (2001) Solving quadratic semi-infinite programming problems by using relaxed cutting-plane scheme. J. Comput. Appl. Math. 129(1):89–104.Crossref, Google Scholar
Fung G, Mangasarian OL (2000) Data selection for support vector machine classifiers. Ramakrishnan R, ed. Proc. 6th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 64–70.Crossref, Google Scholar
Guan W, Gray A, Leyffer S (2009) Mixed-integer support vector machine. LeCun Y, Zemel R, eds. NIPS Workshop on Optimization for Machine Learning (Curran Associates, Red Hook, NY).Google Scholar
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J. Machine Learn. Res. 3(March):1157–1182.Google Scholar
Han J, Sun Z, Hao H (2015) l 0-norm based structural sparse least square regression for feature selection. Pattern Recognition 48(12):3927–3940.Crossref, Google Scholar
Hettich R, Kortanek KO (1993) Semi-infinite programming: Theory, methods, and applications. SIAM Rev. 35(3):380–429.Crossref, Google Scholar
Hull JV, Jacokes ZJ, Torgerson CM, Irimia A, Van Horn JD (2016) Resting-state functional connectivity in autism spectrum disorders: A review. Frontiers Psychiatry 7:205.Crossref, Google Scholar
Jacob L, Obozinski G, Vert JP (2009) Group lasso with overlap and graph lasso. Jones B, ed. Proc. 26th Annual Internat. Conf. Machine Learn. (ACM, New York), 433–440.Crossref, Google Scholar
Jenatton R, Audibert JY, Bach F (2011) Structured variable selection with sparsity-inducing norms. J. Machine Learn. Res. 12:2777–2824.Google Scholar
Kim S, Xing EP (2012) Tree-guided group lasso for multi-response regression with structured sparsity, with an application to eQTL mapping. Ann. Appl. Statist. 6(3):1095–1117.Crossref, Google Scholar
Kim SJ, Boyd S (2008) A minimax theorem with applications to machine learning, signal processing, and finance. SIAM J. Optim. 19(3):1344–1367.Crossref, Google Scholar
Klöppel S, Abdulkadir A, Jack CR, Koutsouleris N, Mourão-Miranda J, Vemuri P (2012) Diagnostic neuroimaging across diseases. NeuroImage 61(2):457–463.Crossref, Google Scholar
Lanckriet GR, Cristianini N, Bartlett P, Ghaoui LE, Jordan MI (2004) Learning the kernel matrix with semidefinite programming. J. Machine Learn. Res. 5:27–72.Google Scholar
Li YF, Kwok JT, Tsang IW, Zhou ZH (2009a) A convex method for locating regions of interest with multi-instance learning. Altun Y, Das K, Mielikäinen T, Malerba D, Stefanowski J, Read J, Žitnik M, Ceci M, Džeroski S, eds. Machine Learning and Knowledge Discovery in Databases (Springer, New York), 15–30.Crossref, Google Scholar
Li YF, Tsang IW, Kwok JT, Zhou ZH (2009b) Tighter and convex maximum margin clustering. van Dyk D, Welling M, eds. Internat. Conf. Artificial Intelligence Statistics (Microtome, Brookline, MA), 344–351.Google Scholar
Lin D, Pitler E, Foster DP, Ungar LH (2008) In defense of ℓ0. McCallum A, Roweis S, eds. Proc. 25th Internat. Conf. Machine Learn. (Omni Press, Madison, WI).Google Scholar
Liu J, Ji S, Ye J (2009) SLEP: Sparse Learning with Efficient Projections (Arizona State University, Phoenix).Google Scholar
Lu Z, Zhang Y (2010) Penalty decomposition methods for ℓ0-norm minimization. Technical report, Simon Fraser University, BC, Canada.Google Scholar
Macambira EM, De Souza CC (2000) The edge-weighted clique problem: Valid inequalities, facets and polyhedral computations. Eur. J. Oper. Res. 123(2):346–371.Crossref, Google Scholar
Maldonado S, Pérez J, Weber R, Labbé M (2014) Feature selection for support vector machines via mixed integer linear programming. Inform. Sci. 279:163–175.Crossref, Google Scholar
McAuley J, Ming J, Stewart D, Hanna P (2005) Subband correlation and robust speech recognition. IEEE Trans. Speech Audio Process. 13(5):956–964.Crossref, Google Scholar
Mourão-Miranda J, Bokde AL, Born C, Hampel H, Stetter M (2005) Classifying brain states and determining the discriminating activation patterns: Support vector machine on functional MRI data. NeuroImage 28(4):980–995.Crossref, Google Scholar
Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2):227–234.Crossref, Google Scholar
Ng AY (2004) Feature selection, ℓ1vs. ℓ2regularization, and rotational invariance. Brodley CE, ed. Proc. 21st Internat. Conf. Machine Learn., vol. 78 (ACM, New York).Crossref, Google Scholar
Pee E, Royset JO (2011) On solving large-scale finite minimax problems using exponential smoothing. J. Optim. Theory Appl. 148(2):390–421.Crossref, Google Scholar
Power JD, Cohen AL, Nelson SM, Wig GS, Barnes KA, Church JA, Vogel AC, Laumann TO, Miezin FM, Schlaggar BL (2011) Functional network organization of the human brain. Neuron 72(4):665–678.Crossref, Google Scholar
Rakotomamonjy A (2003) Variable selection using svm based criteria. J. Machine Learn. Res. 3:1357–1370.Google Scholar
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2007) More efficiency in multiple kernel learning. Ghahramani Z, ed. Proc. 24th Internat. Conf. Machine Learn. (Omni Press, Madison, WI), 775–782.Crossref, Google Scholar
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008) SimpleMKL. J. Machine Learn. Res. 9:2491–2521.Google Scholar
Rao N, Cox C, Nowak R, Rogers TT (2013) Sparse overlapping sets lasso for multitask learning and its application to fMRI analysis. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 26 (Curran Associates, Red Hook, NY), 2202–2210.Google Scholar
Rosa MJ, Portugal L, Hahn T, Fallgatter AJ, Garrido MI, Shawe-Taylor J, Mourao-Miranda J (2015) Sparse network-based models for patient classification using fMRI. NeuroImage 105(15):493–506.Crossref, Google Scholar
Sion M (1958) On general minimax theorems. Pacific J. Math. 8(1):171–176.Crossref, Google Scholar
Sporns O (2011) Networks of the Brain (MIT Press, Cambridge).Google Scholar
Sun Z, Ampornpunt N, Varma M, Vishwanathan S (2010) Multiple kernel learning and the SMO algorithm. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 23 (Curran Associates, Red Hook, NY), 2361–2369.Google Scholar
Tan M, Wang L, Tsang IW (2010) Learning sparse svm for feature selection on very high dimensional datasets. Fürnkranz J, Joachims T, eds. Proc. 27th Internat. Conf. Machine Learn. (ICML-10) (Omni Press, Madison, WI), 1047–1054.Google Scholar
Tang J, Alelyani S, Liu H (2014) Feature selection for classification: A review. Aggaarwal CC, ed. Data Classification: Algorithms and Applications, vol. 37 (CRC Press, Boca Raton, FL).Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Royal Statist. Soc. Ser. B: Methodology 58(1):267–288.Google Scholar
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J. Royal Statist. Soc.: Ser. B: Statist. Methodology 67(1):91–108.Crossref, Google Scholar
Wang H, Leng C (2008) A note on adaptive group lasso. Comput. Statist. Data Anal. 52(12):5277–5286.Crossref, Google Scholar
Wang L, Shen X (2007) On ℓ1-norm multiclass support vector machines. J. Amer. Statist. Assoc. 102(478):583–594.Crossref, Google Scholar
Weng SJ, Wiggins JL, Peltier SJ, Carrasco M, Risi S, Lord C, Monk CS (2010) Alterations of resting state functional connectivity in the default network in adolescents with autism spectrum disorders. Brain Res. 1313:202–214.Crossref, Google Scholar
Weston J, Elisseeff A, Schölkopf B, Tipping M (2003) Use of the zero-norm with linear models and kernel methods. J. Machine Learn. Res. 3:1439–1461.Google Scholar
Won D (2016) Optimization and machine learning frameworks for complex network analysis. PhD thesis, University of Washington, Seattle.Google Scholar
Xie J, Zeng L (2010) Group variable selection methods and their applications in analysis of genomic data. Feng J, Fu W, Sun F, eds. Frontiers Computational and Systems Biology (Springer, New York), 231–248.Crossref, Google Scholar
Xu Z, Jin R, Ye J, Lyu MR, King I (2009) Non-monotonic feature selection. Leon Bottou L, Littman M, eds. Proc. 26th Annual Internat. Conf. Machine Learn. (Omni Press, Madison, WI), 1145–1152.Crossref, Google Scholar
Yang C, Wan X, Yang Q, Xue H, Yu W (2010) Identifying main effects and epistatic interactions from large-scale snp data via adaptive group lasso. BMC Bioinform. 11(1):S18.Crossref, Google Scholar
Yeo BT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, Roffman JL, et al.. (2011) The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J. Neurophysiol. 106(3):1125–1165.Crossref, Google Scholar
Yuan GX, Chang KW, Hsieh CJ, Lin CJ (2010) A comparison of optimization methods and software for large-scale l1-regularized linear classification. J. Machine Learn. Res. 11(1–4):3183–3234.Google Scholar
Yuan L, Liu J, Ye J (2011) Efficient methods for overlapping group lasso. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira F, Weinberger KQ, eds. Advances in Neural Information Processing Systems, vol. 24 (Curran Associates, Red Hook, NY), 352–360.Google Scholar
Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J. Royal Statist. Soc.: Ser. B: Statist. Methodology 68(1):49–67.Crossref, Google Scholar
Zeng LL, Shen H, Liu L, Wang L, Li B, Fang P, Zhou Z, Li Y, Hu D (2012) Identifying major depression using whole-brain functional connectivity: A multivariate pattern analysis. Brain 135(5):1498–1507.Crossref, Google Scholar
Zhang CH, Huang J (2008) The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Statist. 36(4)1567–1594.Crossref, Google Scholar
Zhang L, Wu SY, López MA (2010) A new exchange method for convex semi-infinite programming. SIAM J. Optim. 20(6):2959–2977.Crossref, Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101(476):1418–1429.Crossref, Google Scholar

cover image INFORMS Journal on Computing

Volume 32, Issue 1

Winter 2020

Pages 1-198, C2

Article Information

Supplemental Material

Metrics

Information

Received:October 26, 2016
Accepted:August 24, 2018
Published Online:July 18, 2019

Cite as

Daehan Won, Hasan Manzour, Wanpracha Chaovalitwongse (2019) Convex Optimization for Group Feature Selection in Networked Data. INFORMS Journal on Computing 32(1):182-198.

https://doi.org/10.1287/ijoc.2018.0868

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Convex Optimization for Group Feature Selection in Networked Data

References

Volume 32, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News