Graph-Based Feature Selection Method Under Budget Constraint for Multiclass Classification Problems

Published Online:https://doi.org/10.1287/ijds.2024.0050

References

  • Brooks JP, Edwards DJ, Larson CE, Van Cleemput N (2024) Conjecturing-based discovery of patterns in data. INFORMS J. Data Sci. 3(2):179–202.LinkGoogle Scholar
  • Grgić-Hlača N, Zafar MB, Gummadi KP, Weller A (2016) The case for process fairness in learning: Feature selection for fair decision making. Lee DD, von Luxburg U, Garnett R, Sugiyama M, Guyon I, eds. NIPS ‘16 Proc. 30th Internat. Conf. Neural Inform. Processing Systems. Sympos. Machine Learn. Law (Curran Associates, Red Hook, NY).Google Scholar
  • Grgić-Hlača N, Zafar MB, Gummadi KP, Weller A (2018) Beyond distributive fairness in algorithmic decision making: Feature selection for procedurally fair learning. Proc. AAAI Conf. Artificial Intelligence, vol. 32, no. 1 (AAAI, Washington, DC).Google Scholar
  • Hruschka ER, Campello RJ, De Castro LN (2006) Evolving clusters in gene-expression data. Inform. Sci. 176(13):1898–1927.Google Scholar
  • Jagdhuber R, Lang M, Rahnenführer J (2020a) Feature selection methods for cost-constrained classification in random forests. Preprint, submitted August 14, https://arxiv.org/abs/2008.06298.Google Scholar
  • Jagdhuber R, Lang M, Stenzl A, Neuhaus J, Rahnenführer J (2020b) Cost-constrained feature selection in binary classification: Adaptations for greedy forward selection and genetic algorithms. BMC Bioinformatics 21(1):26.Google Scholar
  • Kachuee M, Goldstein O, Karkkainen K, Darabi S, Sarrafzadeh M (2019) Opportunistic learning: Budgeted cost-sensitive learning from data streams. Preprint, submitted January 2, https://arxiv.org/abs/1901.00243.Google Scholar
  • Katsevich E, Ramdas A (2018) Simultaneous high-probability bounds on the false discovery proportion in structured, regression and online settings. Preprint, submitted March 19, https://arxiv.org/abs/1803.06790.Google Scholar
  • Knauer R, Rodner E (2023) Cost-sensitive best subset selection for logistic regression: A mixed-integer conic optimization perspective. Seipel D, Steen A, eds. KI 2023 Adv. Artificial Intelligence 46th German Conf. AI Proc. (Springer Verlag, Berlin), 114–129.Google Scholar
  • Levin D, Singer G (2024) GB-AFS: Graph-based automatic feature selection for multi-class classification via Mean Simplified Silhouette. J. Big Data 11(1):79.Google Scholar
  • Li J, Cheng K, Wang S, Morstatter F, Trevino RP, Tang J, Liu H (2017) Feature selection: A data perspective. ACM Comput. Surveys (CSUR) 50(6):1–45.Google Scholar
  • Liu H, Motoda H (2012) Feature Selection for Knowledge Discovery and Data Mining, The Springer International Series in Engineering and Computer Science, vol. 454 (Springer Science & Business Media, New York).Google Scholar
  • Longford NT (1987) A fast scoring algorithm for maximum likelihood estimation in unbalanced mixed models with nested random effects. Biometrika 74(4):817–827.Google Scholar
  • Microsoft (2019) Microsoft malware prediction. Kaggle, https://www.kaggle.com/c/microsoft-malware-prediction/data.Google Scholar
  • Min F, Xu J (2016) Semi-greedy heuristics for feature selection with test cost constraints. Granular Comput. 1(3):199–211.Google Scholar
  • Min F, Hu Q, Zhu W (2014) Feature selection with test cost constraint. Internat. J. Approximate Reasoning 55(1):167–179.Google Scholar
  • Nguyen S, Chan R, Cadena J, Soper B, Kiszka P, Womack L, Work M, et al. (2021) Budget constrained machine learning for early prediction of adverse outcomes for covid-19 patients. Sci. Rep. 11(1):19543.Google Scholar
  • NHANES (1999–2016) National health and nutrition examination survey. Survey methods and analytic guidelines. National Center for Health Statistics, Hyattsville, MD, https://wwwn.cdc.gov/nchs/nhanes/analyticguidelines.aspx.Google Scholar
  • Olteanu A (2020) Gtzan dataset—Music genre classification. Kaggle, https://www.kaggle.com/datasets/andradaolteanu/gtzan-dataset-music-genre-classification.Google Scholar
  • Pudjihartono N, Fadason T, Kempa-Liehr AW, O’Sullivan JM (2022) A review of feature selection methods for machine learning-based disease risk prediction. Frontiers Bioinformatics 2:927312.Google Scholar
  • Quinlan JR (1986) Induction of decision trees. Machine Learn. 1:81–106.Google Scholar
  • Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Machine Learn. 53:23–69.Google Scholar
  • Rousseeuw PJ (1987) Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20:53–65.Google Scholar
  • Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike Information Criterion Statistics (D. Reidel, Dordrecht, Netherlands).Google Scholar
  • Satopaa V, Albrecht J, Irwin D, Raghavan B (2011) Finding a “kneedle” in a haystack: Detecting knee points in system behavior. 2011 31st Internat. Conf. Distributed Comput. Systems Workshops (IEEE, New York), 166–171.Google Scholar
  • Tsiaras T (2019) Predicting profitable customer segments. Kaggle, https://www.kaggle.com/datasets/tsiaras/predicting-profitable-customer-segments.Google Scholar
  • van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J. Machine Learn. Res. 9(86):2579–2605.Google Scholar
  • Wang Y, Qi Q, Liu Y (2018) Unsupervised segmentation evaluation using area-weighted variance and Jeffries-Matusita distance for remote sensing images. Remote Sensing 10(8):1193.Google Scholar
  • Welch WJ (1982) Algorithmic complexity: Three NP-hard problems in computational statistics. J. Statist. Comput. Simulation 15(1):17–25.Google Scholar
  • Won D, Jansen P, Carbonell J (2020) Minimizing and recovering from the effect of concept drift via feature selection. 24th Eur. Conf. Artificial Intelligence - ECAI 2020 (IOS Press, Amsterdam), 1611–1617.Google Scholar
  • Yang H, Xu Z, Lyu MR, King I (2015) Budget constrained non-monotonic feature selection. Neural Networks 71:214–224.Google Scholar
  • Yu G, Witten D, Bien J (2022) Controlling costs: Feature selection on a budget. Stat 11(1):e427.Google Scholar
  • Zafar MB, Valera I, Gomez-Rodriguez M, Gummadi KP (2019) Fairness constraints: A flexible approach for fair classification. J. Machine Learn. Res. 20(75):1–42.Google Scholar
  • Zhang T (2011) Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theory 57(7):4689–4708.Google Scholar
  • Zhang Y, Cheng S, Shi Y, Gong DW, Zhao X (2019) Cost-sensitive feature selection using two-archive multi-objective artificial bee colony algorithm. Expert Systems Appl. 137:46–58.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.