Rethinking Cost-Sensitive Classification in Deep Learning via Adversarial Data Augmentation

Published Online:https://doi.org/10.1287/ijds.2022.0033

References

  • Arpit D, Jastrzębski S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, et al. (2017) A closer look at memorization in deep networks. Proc. Internat. Conf. on Machine Learn. (PMLR, New York), 233–242.Google Scholar
  • Bahnsen AC, Aouada D, Ottersten B (2014) Example-dependent cost-sensitive logistic regression for credit scoring. Proc. 13th Internat. Conf. on Machine Learn. and Applications (IEEE, New York), 263–269.Google Scholar
  • Bartlett PL, Long PM, Lugosi G, Tsigler A (2020) Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA 117(48):30063–30070.Google Scholar
  • Belkin M, Hsu DJ, Mitra P (2018) Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate. Adv. Neural Inform. Processing Systems 31 (MIT Press, Cambridge, MA), 2300–2311.Google Scholar
  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and Regression Trees (Wadsworth, Belmont, CA).Google Scholar
  • Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, Neelakantan A, et al. (2020) Language models are few-shot learners. Adv. Neural Inform. Processing Systems 33 (MIT Press, Cambridge, MA), 1877–1901.Google Scholar
  • Cao X, Gong NZ (2017) Mitigating evasion attacks to deep neural networks via region-based classification. Proc. 33rd Annual Computer Security Applications Conf. (ACM, New York), 278–287.Google Scholar
  • Chan P, Stolfo SJ (1999) Toward scalable learning with non-uniform distributions: Effects and a multi-classifier approach. Proc. 4th Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York).Google Scholar
  • Chung YA, Lin HT, Yang SW (2015) Cost-aware pre-training for multiclass cost-sensitive deep learning. Preprint, submitted November 30, https://arxiv.org/abs/1511.09337.Google Scholar
  • Chung YA, Yang SW, Lin HT (2020) Cost-sensitive deep learning with layer-wise cost estimation. Proc. Internat. Conf. on Tech. and Applications of Artificial Intelligence (IEEE, New York), 108–113.Google Scholar
  • Deng L (2012) The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29(6):141–142.Google Scholar
  • Devlin J, Chang MW, Lee K, Toutanova K (2018) Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint, submitted October 11, https://arxiv.org/abs/1810.04805.Google Scholar
  • Domingos P (1999) Metacost: A general method for making classifiers cost-sensitive. Proc. 5th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 155–164.Google Scholar
  • Donhauser K, Tifrea A, Aerni M, Heckel R, Yang F (2021) Interpolation can hurt robust generalization even when there is no noise. Adv. Neural Inform. Processing Systems 34 (MIT Press, Cambridge, MA), 23465–23477.Google Scholar
  • Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, et al. (2020) An image is worth 16x16 words: Transformers for image recognition at scale. Preprint, submitted October 22, https://arxiv.org/abs/2010.11929.Google Scholar
  • Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. Preprint, submitted December 20, https://arxiv.org/abs/1412.6572.Google Scholar
  • Guo C, Pleiss G, Sun Y, Weinberger KQ (2017) On calibration of modern neural networks. Proc. Internat. Conf. on Machine Learn. (PMLR, New York), 1321–1330.Google Scholar
  • Hassani H, Javanmard A (2022) The curse of overparametrization in adversarial training: Precise analysis of robust generalization for random features regression. Preprint, submitted January 13, https://arxiv.org/abs/2201.05149.Google Scholar
  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition (IEEE, Piscataway, NJ), 770–778.Google Scholar
  • Health Services Department (2019) National action plan for adverse drug event prevention, office of disease prevention and health promotion, department of health and human services. Accessed November 1, 2021, https://health.gov/sites/default/files/2019-09/ADE-Action-Plan-508c.pdf.Google Scholar
  • Heo B, Lee M, Yun S, Choi JY (2019) Knowledge distillation with adversarial samples supporting decision boundary. Proc. Conf. AAAI Artificial Intelligence 33(01):3771–3778.Google Scholar
  • Institute for Safe Medication Practices (ISMP) (2021) ISMP list of high-alert medications in community/ambulatory care settings. Accessed July 10, 2022, https://www.ismp.org/recommendations/high-alert-medications-community-ambulatory-list.Google Scholar
  • Iranmehr A, Masnadi-Shirazi H, Vasconcelos N (2019) Cost-sensitive support vector machines. Neurocomputing 343:50–64.Google Scholar
  • Kermany DS, Goldbaum M, Cai W, Valentim CC, Liang H, Baxter SL, McKeown A, et al. (2018) Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172(5):1122–1131.Google Scholar
  • Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. 3rd Internat. Conf. Learning Representations (San Diego).Google Scholar
  • Kozerawski J, Fragoso V, Karianakis N, Mittal G, Turk M, Chen M (2020) Blt: Balancing long-tailed data sets with adversarially-perturbed images. Proc. Asian Conf. on Computer Vision (ACM, New York).Google Scholar
  • Kukar M, Kononenko I (1998) Cost-sensitive learning with neural networks. ECAI 15(27):88–94.Google Scholar
  • Kurakin A, Goodfellow IJ, Bengio S (2018) Adversarial examples in the physical world. Artificial Intelligence Safety and Security (Chapman and Hall/CRC, Boca Raton, FL), 99–112.Google Scholar
  • Lester CA, Li J, Ding Y, Rowell B, Yang J, Kontar RA (2021) Performance evaluation of a prescription medication image classification model: An observational cohort. NPJ Digital Medicine 4(1):1–8.Google Scholar
  • Li J, Li D, Xiong C, Hoi S (2022) Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. Proc. Internat. Conf. on Machine Learn. (PMLR, New York), 12888–12900.Google Scholar
  • Li X, Ma H, Meng L, Meng X (2021) Comparative study of adversarial training methods for long-tailed classification. Proc. 1st Internat. Workshop on Adversarial Learn. for Multimedia (ACM, New York), 1–7.Google Scholar
  • Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, et al. (2021) Swin transformer: Hierarchical vision transformer using shifted windows. Proc. IEEE/CVF Internat. Conf. on Computer Vision (IEEE, Piscataway, NJ), 10012–10022.Google Scholar
  • Lomax S, Vadera S (2013) A survey of cost-sensitive decision tree induction algorithms. ACM Comput. Survey 45(2):1–35.Google Scholar
  • Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. Preprint, submitted November 14, https://arxiv.org/abs/1711.05101.Google Scholar
  • Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Toward deep learning models resistant to adversarial attacks. Proc. Internat. Conf. on Learn. Representations (ICLR, Appleton, WI).Google Scholar
  • Moosavi-Dezfooli SM, Fawzi A, Frossard P (2016) Deepfool: A simple and accurate method to fool deep neural networks. Proc. IEEE Conf. on Computer Vision and Pattern Recognition (IEEE, Piscataway, NJ), 2574–2582.Google Scholar
  • Park S, Hong Y, Heo B, Yun S, Choi JY (2022) The majority can help the minority: Context-rich minority oversampling for long-tailed classification. Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition (IEEE, Piscataway, NJ), 6887–6896.Google Scholar
  • Shen H, Chen S, Wang R, Wang X (2023) Adversarial learning with cost-sensitive classes. IEEE Trans. Cybernetics 53(8):4855–4866.Google Scholar
  • Smith LN, Topin N (2019) Super-convergence: Very fast training of neural networks using large learning rates. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, vol. 11006 (SPIE, Bellingham, WA), 369–386.Google Scholar
  • Sun W, Kontar RA, Jin J, Chang TS (2022) A continual learning framework for adaptive defect classification and inspection. Preprint, submitted March 16, https://arxiv.org/abs/2203.08796.Google Scholar
  • Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R (2013) Intriguing properties of neural networks. Preprint, submitted December 21, https://arxiv.org/abs/1312.6199.Google Scholar
  • Tu HH, Lin HT (2010) One-sided support vector regression for multiclass cost-sensitive classification. Proc. Internat. Conf. on Machine Learn. (PMLR, New York).Google Scholar
  • Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, et al. (2017) Attention is all you need. Adv. Neural Inform. Processing Systems 30 (MIT Press, Cambridge, MA), 5998–6008.Google Scholar
  • Volpi R, Namkoong H, Sener O, Duchi JC, Murino V, Savarese S (2018) Generalizing to unseen domains via adversarial data augmentation. Preprint, submitted May 30, https://arxiv.org/abs/1805.12018.Google Scholar
  • Wyeth Pharmaceuticals Inc (2018) Cordarone [package insert]. Accessed July 8, 2022, https://www.accessdata.fda.gov/drugsatfda_docs/label/2018/018972s054lbl.pdf.Google Scholar
  • Yue X, Nouiehed M, Kontar RA (2021) Gifair-fl: An approach for group and individual fairness in federated learning. Preprint, submitted August 5, https://arxiv.org/abs/2108.02741.Google Scholar
  • Zadrozny B, Elkan C (2001) Learning and making decisions when costs and probabilities are both unknown. Proc. 7th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 204–213.Google Scholar
  • Zadrozny B, Langford J, Abe N (2003) Cost-sensitive learning by cost-proportionate example weighting. Proc. 3rd IEEE Internat. Conf. on Data Mining (IEEE, New York), 435–442.Google Scholar
  • Zhang X, Evans D (2018) Cost-sensitive robustness against adversarial examples. Preprint, submitted October 22, https://arxiv.org/abs/1810.09225.Google Scholar
  • Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2016) Understanding deep learning requires rethinking generalization. Preprint, submitted November 10, https://arxiv.org/abs/1611.03530.Google Scholar
  • Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2021) Understanding deep learning (still) requires rethinking generalization. Comm. ACM 64(3):107–115.Google Scholar
  • Zhang Y, Kang B, Hooi B, Yan S, Feng J (2023) Deep long-tailed learning: A survey. IEEE Trans. Pattern Anal. Machine Intelligence (IEEE, Piscataway, NJ).Google Scholar
  • Zhou ZH, Liu XY (2005) Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Trans. Knowledge Data Engrg. 18(1):63–77.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.