An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems

Published Online:https://doi.org/10.1287/ijoc.2023.1274

References

  • Abdi L, Hashemi S (2015) To combat multi-class imbalanced problems by means of over-sampling techniques. IEEE Trans. Knowledge Data Engrg. 28(1):238–251.CrossrefGoogle Scholar
  • Agarwal S, Chowdary CR (2020) A-stacking and a-bagging: Adaptive versions of ensemble learning algorithms for spoof fingerprint detection. Expert Systems Appl. 146:113160.CrossrefGoogle Scholar
  • Ahmed S, Mahbub A, Rayhan F, Jani R, Shatabda S, Farid DM (2017) Hybrid methods for class imbalance learning employing bagging with sampling techniques. Proc. 2nd Internat. Conf. on Comput. Systems and Inform. Tech. for Sustainable Solution (CSITSS), (IEEE, New York), 1–5.Google Scholar
  • Asuncion A, Newman D (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/mlearn/MLRepository.html.Google Scholar
  • Bagui S, Li K (2021) Resampling imbalanced data for network intrusion detection data sets. J. Big Data 8(1):6.CrossrefGoogle Scholar
  • Bernardo A, Della Valle E (2021) Vfc-smote: Very fast continuous synthetic minority oversampling for evolving data streams. Data Mining Knowledge Discovery 35(11):2679–2713.CrossrefGoogle Scholar
  • Bi J, Zhang C (2018) An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme. Knowledge Base Systems 158:81–93.CrossrefGoogle Scholar
  • Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition 30(7):1145–1159.CrossrefGoogle Scholar
  • Branco P, Torgo L, Ribeiro RP (2016) A survey of predictive modeling on imbalanced domains. ACM Comput. Survey 49(2):1–50.CrossrefGoogle Scholar
  • Breiman L (1996) Bagging predictors. Machine Learn. 24(2):123–140.CrossrefGoogle Scholar
  • Cao C, Wang Z (2018) Imcstacking: Cost-sensitive stacking learning with feature inverse mapping for imbalanced problems. Knowledge Base Systems 150:27–37.CrossrefGoogle Scholar
  • Chan KY, Kwong C, Kremer GE (2020) Predicting customer satisfaction based on online reviews and hybrid ensemble genetic programming algorithms. Engrg. Appl. Artificial Intelligence 95:103902.CrossrefGoogle Scholar
  • Charte F, Rivera AJ, del Jesus MJ, Herrera F (2019) Dealing with difficult minority labels in imbalanced mutilabel data sets. Neurocomput. 326:39–53.CrossrefGoogle Scholar
  • Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2011) Smote: Synthetic minority over-sampling technique. J. Artificial Intelligence Res. 16(1):321–357.Google Scholar
  • Cherian A, Wang J (2022) Generalized one-class learning using pairs of complementary classifiers. IEEE Trans. Pattern Anal. Machine Intelligence 44(10):6993–7009.CrossrefGoogle Scholar
  • Cruz RM, Oliveira DV, Cavalcanti GD, Sabourin R (2019) Fire-des++: Enhanced online pruning of base classifiers for dynamic ensemble selection. Pattern Recognition 85:149–160.CrossrefGoogle Scholar
  • Dong X, Yu Z, Cao W, Shi Y, Ma Q (2020) A survey on ensemble learning. Frontiers Comput. Sci. 14(2):241–258.CrossrefGoogle Scholar
  • Englhardt A, Trittenbach H, Vetter D, Böhm K (2020) Finding the sweet spot: Batch selection for one-class active learning. Demeniconi C, Chawla N, eds. Proc. SIAM Internat. Conf. on Data Mining (SIAM, Philadelphia), 118–126.Google Scholar
  • Fernández A, García S, Galar M, Prati RC, Krawczyk B, Herrera F (2018) Cost-sensitive learning. Learning from Imbalanced Data Sets (Springer, Berlin), 63–78.CrossrefGoogle Scholar
  • Gao L, Zhang L, Liu C, Wu S (2020) Handling imbalanced medical image data: A deep-learning-based one-class classification approach. Artificial Intelligence Medicine 108:101935.CrossrefGoogle Scholar
  • García V, Sánchez JS, Mollineda RA (2012) On the effectiveness of preprocessing methods when dealing with different levels of class imbalance. Knowledge Base Systems 25(1):13–21.CrossrefGoogle Scholar
  • Ge R, Feng J, Gu B, Zhang P (2017) Predicting and deterring default with social media information in peer-to-peer lending. J. Management Inform. Systems 34(2):401–424.CrossrefGoogle Scholar
  • Guzmán-Ponce A, Sánchez JS, Valdovinos RM, Marcial-Romero JR (2021) Dbig-us: A two-stage under-sampling algorithm to face the class imbalance problem. Expert Systems Appl. 168:114301.CrossrefGoogle Scholar
  • Junsomboon N, Phienthrakul T (2017) Combining over-sampling and under-sampling techniques for imbalance data set. Proc. 9th Internat. Conf. on Machine Learn. and Comput. ICMLC 2017 (Association for Computing Machinery, New York), 243–247.Google Scholar
  • Jussupow E, Spohrer K, Heinzl A, Gawlitza J (2021) Augmenting medical diagnosis decisions? An investigation into physicians! Decision-making process with artificial intelligence. Inform. Systems Res. 32(3):713–735.LinkGoogle Scholar
  • Kabir A, Ruiz C, Alvarez SA (2018) Mixed bagging: A novel ensemble learning framework for supervised classification based on instance hardness. Proc. IEEE Internat. Conf. Data Mining (ICDM), (IEEE, New York), 1073–1078.Google Scholar
  • Kadkhodaei HR, Moghadam AME, Dehghan M (2020) Hboost: A heterogeneous ensemble classifier based on the boosting method and entropy measurement. Expert Systems Appl. 157:113482.CrossrefGoogle Scholar
  • Kang Q, Chen X, Li S, Zhou M (2016) A noise-filtered under-sampling scheme for imbalanced classification. IEEE Trans. Cybernetics 47(12):4263–4274.CrossrefGoogle Scholar
  • Kang Q, Shi L, Zhou M, Wang X, Wu Q, Wei Z (2017) A distance-based weighted undersampling scheme for support vector machines and its application to imbalanced classification. IEEE Trans. Neural Networks Learn. Systems 29(9):4152–4165.CrossrefGoogle Scholar
  • Kauffmann J, Müller KR, Montavon G (2020) Toward explaining anomalies: A deep Taylor decomposition of one-class models. Pattern Recognition 101:107198.CrossrefGoogle Scholar
  • Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R (2017) Cost-sensitive learning of deep feature representations from imbalanced data. IEEE Trans. Neural Networks Learn. Systems 29(8):3573–3587.Google Scholar
  • Khoshgoftaar TM, Rebours P (2007) Improving software quality prediction by noise filtering techniques. J. Comput. Sci. Tech. 22(3):387–396.CrossrefGoogle Scholar
  • Kovács G (2019) Smote-variants: A python implementation of 85 minority oversampling techniques. Neurocomput. 366:352–354.CrossrefGoogle Scholar
  • Koziarski M, Krawczyk B, Woźniak M (2019) Radial-based oversampling for noisy imbalanced data classification. Neurocomput. 343:19–33.CrossrefGoogle Scholar
  • Leevy JL, Khoshgoftaar TM, Bauder RA, Seliya N (2018) A survey on addressing high-class imbalance in big data. J. Big Data 5(1):1–30.CrossrefGoogle Scholar
  • Li B, Liu Y, Wang X (2019) Gradient harmonized single-stage detector. Proc. Conf. AAAI Artificial Intelligence (AAAI Press, Palo Alto, CA), 33:8577–8584.CrossrefGoogle Scholar
  • Liang X, Jiang A, Li T, Xue Y, Wang G (2020) Lr-smote—An improved unbalanced data set oversampling based on k-means and svm. Knowledge Base Systems 196:105845.CrossrefGoogle Scholar
  • Lim P, Goh CK, Tan KC (2016) Evolutionary cluster-based synthetic oversampling ensemble (eco-ensemble) for imbalance learning. IEEE Trans. Cybernetics 47(9):2850–2861.CrossrefGoogle Scholar
  • Lin TY, Goyal P, Girshick R, He K, Dollár P (2017a) Focal loss for dense object detection. IEEE Trans. Pattern Analysis Machine Intelligence 99:2999–3007.Google Scholar
  • Lin WC, Tsai CF, Hu YH, Jhang JS (2017b) Clustering-based undersampling in class-imbalanced data. Inform. Sci. 409:17–26.CrossrefGoogle Scholar
  • Liu B, Tsoumakas G (2020) Dealing with class imbalance in classifier chains via random undersampling. Knowledge Base Systems 192:105292.CrossrefGoogle Scholar
  • Liu XY, Wu J, Zhou ZH (2009) Exploratory undersampling for class-imbalance learning. IEEE Trans. Systems Man Cybernetics B Cybernetics 39(2):539–550.CrossrefGoogle Scholar
  • Liu Z, Wei P, Jiang J, Cao W, Bian J, Chang Y (2020a) Mesa: Boost ensemble imbalanced learning with meta-sampler. Adv. Neural Inform. Processing Systems 33:14463–14474.Google Scholar
  • Liu Z, Cao W, Gao Z, Bian J, Chen H, Chang Y, Liu TY (2020b) Self-paced ensemble for highly imbalanced massive data classification. Proc. IEEE 36th Internat. Conf. on Data Engrg. (ICDE) (IEEE, New York), 841–852.Google Scholar
  • Low R, Cheah L, You L (2020) Commercial vehicle activity prediction with imbalanced class distribution using a hybrid sampling and gradient boosting approach. IEEE Trans. Intelligent Transportation Systems 22(3):1401–1410.CrossrefGoogle Scholar
  • Ng WW, Hu J, Yeung DS, Yin S, Roli F (2014) Diversified sensitivity-based undersampling for imbalance classification problems. IEEE Trans. Cybernetics 45(11):2402–2412.CrossrefGoogle Scholar
  • Oksuz K, Cam BC, Kalkan S, Akbas E (2020) Imbalance problems in object detection: A review. IEEE Trans. Pattern Anal. Machine Intelligence 43(10):3388–3415.CrossrefGoogle Scholar
  • Pereira RM, Costa YM, Silla CN Jr (2020) Mltl: A multi-label approach for the tomek link undersampling algorithm. Neurocomput. 383:95–105.CrossrefGoogle Scholar
  • Pereira RM, Costa YM, Silla CN Jr (2021) Toward hierarchical classification of imbalanced data using random resampling algorithms. Inform. Sci. 578:344–363.CrossrefGoogle Scholar
  • Perera P, Patel VM (2019) Learning deep features for one-class classification. IEEE Trans. Image Processing 28(11):5450–5463.CrossrefGoogle Scholar
  • Razavi-Far R, Farajzadeh-Zanjani M, Wang B, Saif M, Chakrabarti S (2021) Imputation-based ensemble techniques for class imbalance learning. IEEE Trans. Knowledge Data Engrg. 33(5):1988–2001.Google Scholar
  • Roshan SE, Asadi S (2020) Improvement of bagging performance for classification of imbalanced data sets using evolutionary multi-objective optimization. Engrg. Appl. Artificial Intelligence 87:103319.CrossrefGoogle Scholar
  • Rout N, Mishra D, Mallick MK (2018) Handling imbalanced data: A survey. Reddy MS, Viswanath K, KM SP, eds. Internat. Proc. Adv. Soft Comput., Intelligent Systems Appl. (Springer, Berlin), 431–443.Google Scholar
  • Roy A, Qureshi S, Pande K, Nair D, Gairola K, Jain P, Singh S, et al. (2019) Performance comparison of machine learning platforms. INFORMS J. Comput. 31(2):207–225.LinkGoogle Scholar
  • Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) Smote–ipf: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inform. Sci. 291:184–203.CrossrefGoogle Scholar
  • Schapire RE (1999) A brief introduction to boosting. Proc. 16th Internat. Joint Conf. Artificial Intelligence, IJCAI’99, vol. 2 (Morgan Kaufmann Publishers Inc., San Francisco), 1401–1406.Google Scholar
  • Seiffert C, Khoshgoftaar TM, Van Hulse J, Napolitano A (2009) Rusboost: A hybrid approach to alleviating class imbalance. IEEE Trans. Systems Man Cybernetics A Systems Human 40(1):185–197.CrossrefGoogle Scholar
  • Sen A, Islam MM, Murase K, Yao X (2015) Binarization with boosting and oversampling for multiclass classification. IEEE Trans. Cybernetics 46(5):1078–1091.CrossrefGoogle Scholar
  • Seng Z, Kareem SA, Varathan KD (2021) A neighborhood undersampling stacked ensemble (nus-se) in imbalanced classification. Expert Systems Appl. 168:114246.CrossrefGoogle Scholar
  • Shao J, Wang Q, Liu F (2019) Learning to sample: An active learning framework. Proc. IEEE Internat. Conf. on Data Mining (IEEE, New York), 538–547.Google Scholar
  • Sheng J, Shi Y, Zhang Q (2021) Improved parallel magnetic resonance imaging reconstruction with multiple variable density sampling. Sci. Rep. 11(1):1–15.CrossrefGoogle Scholar
  • Sowah RA, Kuditchar B, Mills GA, Acakpovi A, Twum RA, Buah G, Agboyi R (2021) Hcbst: An efficient hybrid sampling technique for class imbalance problems. ACM Trans. Knowledge Discovery Data 16(3):1–37.Google Scholar
  • Sun J, Lang J, Fujita H, Li H (2018) Imbalanced enterprise credit evaluation with dte-sbd: Decision tree ensemble based on smote and bagging with differentiated sampling rates. Inform. Sci. 425:76–91.CrossrefGoogle Scholar
  • Taherkhani A, Cosma G, McGinnity TM (2020) Adaboost-cnn: An adaptive boosting algorithm for convolutional neural networks to classify multi-class imbalanced data sets using transfer learning. Neurocomput. 404:351–366.CrossrefGoogle Scholar
  • Tao X, Li Q, Guo W, Ren C, Li C, Liu R, Zou J (2019) Self-adaptive cost weights-based support vector machine cost-sensitive ensemble for imbalanced data classification. Inform. Sci. 487:31–56.CrossrefGoogle Scholar
  • Tarekegn A, Giacobini M, Michalak K (2021) A review of methods for imbalanced multi-label classification. Pattern Recognition 118:107965.CrossrefGoogle Scholar
  • Tomek I (1976) Two modifications of cnn. IEEE Trans. Syst. Man Cybern. 6(11):769–772.Google Scholar
  • Veganzones D, Séverin E (2018) An investigation of bankruptcy prediction in imbalanced data sets. Decision Support Systems 112:111–124.CrossrefGoogle Scholar
  • Xie Y, Qiu M, Zhang H, Peng L, Chen Z (2022) Gaussian distribution based oversampling for imbalanced data classification. IEEE Trans. Knowledge Data Engrg. 34(2):667–679.Google Scholar
  • Xu Z, Shen D, Nie T, Kou Y, Yin N, Han X (2021) A cluster-based oversampling algorithm combining smote and k-means for imbalanced medical data. Inform. Sci. 572:574–589.CrossrefGoogle Scholar
  • Yin J, Gan C, Zhao K, Lin X, Quan Z, Wang ZJ (2020) A novel model for imbalanced data classification. Proc. Conf. AAAI Artificial Intelligence, vol. 34 (AAAI Press, Palo Alto, CA), 6680–6687.CrossrefGoogle Scholar
  • Yu L, Zhou R, Tang L, Chen R (2018) A dbn-based resampling svm ensemble learning paradigm for credit classification with imbalanced data. Appl. Soft Comput. 69:192–202.CrossrefGoogle Scholar
  • Zhao H, Zhao C, Zhang X, Liu N, Zhu H, Liu Q, Xiong H (2022) An ensemble learning approach with gradient resampling for class-imbalance problems v2021.0104 URL http://dx.doi.org/10.5281/zenodo.6360996, available for download at https://github.com/INFORMSJoC/2021.0104.Google Scholar
  • Zheng M, Li T, Zheng X, Yu Q, Chen C, Zhou D, Lv C, Yang W (2021) Uffdfr: Undersampling framework with denoising, fuzzy c-means clustering, and representative sample selection for imbalanced data classification. Inform. Sci. 576:658–680.CrossrefGoogle Scholar
  • Zhu X, Yang J, Zhang C, Zhang S (2021) Efficient utilization of missing data in cost-sensitive learning. IEEE Trans. Knowledge Data Engrg. 33(6):2425–2436.Google Scholar
  • Zhu Z, Wang Z, Li D, Zhu Y, Du W (2018) Geometric structural ensemble learning for imbalanced problems. IEEE Trans. Cybernetics 50(4):1617–1629.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.