Graph-Based Feature Selection Method Under Budget Constraint for Multiclass Classification Problems

David Levin
Corresponding Author
David Levin
[email protected]
https://orcid.org/0009-0007-2220-4070
Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
Search for more papers by this author
,
Gonen Singer
Gonen Singer
[email protected]
https://orcid.org/0000-0002-2610-9579
Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel
Search for more papers by this author

David Levin

Corresponding Author

David Levin

[email protected]

https://orcid.org/0009-0007-2220-4070

Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel

Search for more papers by this author

Gonen Singer

[email protected]

https://orcid.org/0000-0002-2610-9579

Faculty of Engineering, Bar-Ilan University, Ramat Gan 5290002, Israel

Search for more papers by this author

Published Online:5 Jun 2025https://doi.org/10.1287/ijds.2024.0050

References

Brooks JP, Edwards DJ, Larson CE, Van Cleemput N (2024) Conjecturing-based discovery of patterns in data. INFORMS J. Data Sci. 3(2):179–202.Link, Google Scholar
Grgić-Hlača N, Zafar MB, Gummadi KP, Weller A (2016) The case for process fairness in learning: Feature selection for fair decision making. Lee DD, von Luxburg U, Garnett R, Sugiyama M, Guyon I, eds. NIPS ‘16 Proc. 30th Internat. Conf. Neural Inform. Processing Systems. Sympos. Machine Learn. Law (Curran Associates, Red Hook, NY).Google Scholar
Grgić-Hlača N, Zafar MB, Gummadi KP, Weller A (2018) Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning. Proc. AAAI Conf. Artificial Intelligence, vol. 32, no. 1 (AAAI, Washington, DC).Google Scholar
Hruschka ER, Campello RJ, De Castro LN (2006) Evolving clusters in gene-expression data. Inform. Sci. 176(13):1898–1927.Google Scholar
Jagdhuber R, Lang M, Rahnenführer J (2020a) Feature selection methods for cost-constrained classification in random forests. Preprint, submitted August 14, https://arxiv.org/abs/2008.06298.Google Scholar
Jagdhuber R, Lang M, Stenzl A, Neuhaus J, Rahnenführer J (2020b) Cost-constrained feature selection in binary classification: Adaptations for greedy forward selection and genetic algorithms. BMC Bioinformatics 21(1):26.Google Scholar
Kachuee M, Goldstein O, Karkkainen K, Darabi S, Sarrafzadeh M (2019) Opportunistic learning: Budgeted cost-sensitive learning from data streams. Preprint, submitted January 2, https://arxiv.org/abs/1901.00243.Google Scholar
Katsevich E, Ramdas A (2018) Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings. Preprint, submitted March 19, https://arxiv.org/abs/1803.06790.Google Scholar
Knauer R, Rodner E (2023) Cost-sensitive best subset selection for logistic regression: A mixed-integer conic optimization perspective. Seipel D, Steen A, eds. KI 2023 Adv. Artificial Intelligence 46th German Conf. AI Proc. (Springer Verlag, Berlin), 114–129.Google Scholar
Levin D, Singer G (2024) GB-AFS: Graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette. J. Big Data 11(1):79.Google Scholar
Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Comput. Surveys (CSUR) 50(6):1–45.Google Scholar
Liu H, Motoda H (2012) Feature Selection for Knowledge Discovery and Data Mining, The Springer International Series in Engineering and Computer Science, vol. 454 (Springer Science & Business Media, New York).Google Scholar
Longford NT (1987) A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika 74(4):817–827.Google Scholar
Microsoft (2019) Microsoft malware prediction. Kaggle, https://www.kaggle.com/c/microsoft-malware-prediction/data.Google Scholar
Min F, Xu J (2016) Semi-greedy heuristics for feature selection with test cost constraints. Granular Comput. 1(3):199–211.Google Scholar
Min F, Hu Q, Zhu W (2014) Feature selection with test cost constraint. Internat. J. Approximate Reasoning 55(1):167–179.Google Scholar
Nguyen S, Chan R, Cadena J, Soper B, Kiszka P, Womack L, Work M, et al. (2021) Budget constrained machine learning for early prediction of adverse outcomes for covid-19 patients. Sci. Rep. 11(1):19543.Google Scholar
NHANES (1999–2016) National health and nutrition examination survey. Survey methods and analytic guidelines. National Center for Health Statistics, Hyattsville, MD, https://wwwn.cdc.gov/nchs/nhanes/analyticguidelines.aspx.Google Scholar
Olteanu A (2020) Gtzan dataset—Music genre classification. Kaggle, https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification.Google Scholar
Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM (2022) A review of feature selection methods for machine learning-based disease risk prediction. Frontiers Bioinformatics 2:927312.Google Scholar
Quinlan JR (1986) Induction of decision trees. Machine Learn. 1:81–106.Google Scholar
Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learn. 53:23–69.Google Scholar
Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20:53–65.Google Scholar
Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike Information Criterion Statistics (D. Reidel, Dordrecht, Netherlands).Google Scholar
Satopaa V, Albrecht J, Irwin D, Raghavan B (2011) Finding a “kneedle” in a haystack: Detecting knee points in system behavior. 2011 31st Internat. Conf. Distributed Comput. Systems Workshops (IEEE, New York), 166–171.Google Scholar
Tsiaras T (2019) Predicting profitable customer segments. Kaggle, https://www.kaggle.com/datasets/tsiaras/predicting-profitable-customer-segments.Google Scholar
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J. Machine Learn. Res. 9(86):2579–2605.Google Scholar
Wang Y, Qi Q, Liu Y (2018) Unsupervised segmentation evaluation using area-weighted variance and Jeffries-Matusita distance for remote sensing images. Remote Sensing 10(8):1193.Google Scholar
Welch WJ (1982) Algorithmic complexity: Three NP-hard problems in computational statistics. J. Statist. Comput. Simulation 15(1):17–25.Google Scholar
Won D, Jansen P, Carbonell J (2020) Minimizing and recovering from the effect of concept drift via feature selection. 24th Eur. Conf. Artificial Intelligence - ECAI 2020 (IOS Press, Amsterdam), 1611–1617.Google Scholar
Yang H, Xu Z, Lyu MR, King I (2015) Budget constrained non-monotonic feature selection. Neural Networks 71:214–224.Google Scholar
Yu G, Witten D, Bien J (2022) Controlling costs: Feature selection on a budget. Stat 11(1):e427.Google Scholar
Zafar MB, Valera I, Gomez-Rodriguez M, Gummadi KP (2019) Fairness constraints: A flexible approach for fair classification. J. Machine Learn. Res. 20(75):1–42.Google Scholar
Zhang T (2011) Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theory 57(7):4689–4708.Google Scholar
Zhang Y, Cheng S, Shi Y, Gong DW, Zhao X (2019) Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm. Expert Systems Appl. 137:46–58.Google Scholar

cover image INFORMS Journal on Data Science

Volume 4, Issue 3

July-September 2025

Pages iii-vi, 197-282, ii

Article Information

Supplemental Material

Metrics

Information

Received:September 28, 2024
Accepted:May 04, 2025
Published Online:June 05, 2025

Cite as

David Levin, Gonen Singer (2025) Graph-Based Feature Selection Method Under Budget Constraint for Multiclass Classification Problems. INFORMS Journal on Data Science 4(3):265-282.

https://doi.org/10.1287/ijds.2024.0050

Keywords

PDF download

Available Issues