Chen W, Yang K, Yu Z, Shi Y, Chen CP (2024) A survey on imbalanced learning: Latest research, applications and future directions. Artificial Intelligence Rev. 57(6):137.Crossref, Google Scholar
Chiu KL, Collins A, Alexander R (2021) Detecting hate speech with GPT-3. Preprint, submitted March 23, https://arxiv.org/abs/2103.12407.Google Scholar
Cohen S, Presil D, Katz O, Arbili O, Messica S, Rokach L (2023) Enhancing social network hate detection using back translation and GPT-3 augmentations during training and test-time. Inform. Fusion 99:101887.Crossref, Google Scholar
Dinan E, Humeau S, Chintagunta B, Weston J (2019) Build it break it fix it for dialogue safety: Robustness from adversarial human attack. Proc. 2019 Conf. Empirical Methods Natural Language Processing (ACL, Stroudsburg, PA), 4537–4546.Google Scholar
Dobriban E, Hassani H, Hong D, Robey A (2023) Provable tradeoffs in adversarially robust classification. IEEE Trans. Inform. Theory 69(12):7793–7822.Crossref, Google Scholar
ElSherief M, Ziems C, Muchlinski D, Anupindi V, Seybolt J, De Choudhury M, Yang D (2021) Latent Hatred: A benchmark for understanding implicit hate speech. Proc. 2021 Conf. Empirical Methods Natural Language Processing (ACL, Stroudsburg, PA), 345–363.Google Scholar
Fortuna P, Nunes S (2018) A survey on automatic detection of hate speech in text. ACM Comput. Surveys 51(4):85.Google Scholar
Gitari ND, Zuping Z, Damien H, Long J (2015) A lexicon-based approach for hate speech detection. Internat. J. Multimedia Ubiquitous Engrg. 10(4):215–230.Crossref, Google Scholar
Goodfellow Ian JS, Szegedy C (2015) Explaining and harnessing adversarial examples. Proc. 3rd Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Goyal S, Doddapaneni S, Khapra MM, Ravindran B (2023) A survey of adversarial defenses and robustness in NLP. ACM Comput. Surveys 55(14S):332.Crossref, Google Scholar
Grolman E, Binyamini H, Shabtai A, Elovici Y, Morikawa I, Shimizu T (2022) HateVersarial: Adversarial attack against hate speech detection algorithms on Twitter. Proc. 30th ACM Conf. User Model. Adaptation Personalization (ACM, New York), 143–152.Google Scholar
Huang T (2025) Content moderation by LLM: From accuracy to legitimacy. Artificial Intelligence Rev. 58(10):1–32.Crossref, Google Scholar
Huang F, Kwak H, An J (2023) Is ChatGPT better than human annotators? Potential and limitations of ChatGPT in explaining implicit hate speech. Companion Proc. ACM Web Conf. (ACM, New York), 294–297.Google Scholar
Kocoń J, Figas A, Gruza M, Puchalska D, Kajdanowicz T, Kazienko P (2021) Offensive, aggressive, and hate speech analysis: From data-centric to human-centered approach. Inform. Processing Management 58(5):102643.Crossref, Google Scholar
Lee S, Lee H, Yoon S (2020) Adversarial Vertex Mixup: Toward better adversarially robust generalization. Proc. 33rd IEEE/CVF Conf. Comput. Vision Pattern Recognition (IEEE, Piscataway, NJ), 272–281.Google Scholar
Li J, Ji S, Du T, Li B, Wang T (2019) TextBugger: Generating adversarial text against real-world applications. Proc. 26th Annual Network Distributed System Security Sympos. (Internet Society, San Diego).Google Scholar
Liu H, Zhang Y, Wang Y, Lin Z, Chen Y (2020) Joint character-level word embedding and adversarial stability training to defend adversarial text. Proc. 34th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 8384–8391.Google Scholar
Madry A, Makelov A, Schmidt L, Tsipras D, Vladu A (2018) Towards deep learning models resistant to adversarial attacks. Proc. 6th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Mathew B, Saha P, Yimam SM, Biemann C, Goyal P, Mukherjee A (2021) HateXplain: A benchmark dataset for explainable hate speech detection. Proc. 35th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 14867–14875.Google Scholar
Mazari AC, Boudoukhani N, Djeffal A (2024) BERT-based ensemble learning for multi-aspect hate speech detection. Cluster Comput. 27(1):325–339.Crossref, Google Scholar
Ocampo NB, Cabrio E, Villata S (2023) Playing the part of the sharp bully: Generating adversarial examples for implicit hate speech detection. Rogers A, Boyd-Graber J, Okazaki N, eds. Findings of the Association for Computational Linguistics: ACL 2023 (Association for Computational Linguistics, Toronto), 2758–2772.Crossref, Google Scholar
Omar M, Mohaisen D (2022) Making adversarially-trained language models forget with model retraining: A case study on hate speech detection. Companion Proc. Web Conf. (ACM, New York), 887–893.Google Scholar
Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, Zhang C, et al. (2022) Training language models to follow instructions with human feedback. Proc. 35th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 27730–27744.Google Scholar
Parker S, Ruths D (2023) Is hate speech detection the solution the world wants? Proc. Natl. Acad. Sci. USA 120(10):e2209384120.Crossref, Google Scholar
Pfeffer J, Matter D, Jaidka K, Varol O, Mashhadi A, Lasser J, Assenmacher D, et al. (2023) Just another day on Twitter: A complete 24 hours of Twitter data. Proc. Internat. AAAI Conf. Web Social Media, vol. 17 (AAAI Press, Palo Alto, CA), 1073–1081.Google Scholar
Rathpisey H, Adji TB (2019) Handling imbalance issue in hate speech classification using sampling-based methods. Proc. 5th Internat. Conf. Sci. Inform. Tech. (IEEE, Piscataway, NJ), 193–198.Google Scholar
Saha P, Garimella K, Kalyan NK, Pandey SK, Meher PM, Mathew B, Mukherjee A (2023) On the rise of fear speech in online social media. Proc. Natl. Acad. Sci. USA 120(11):e2212270120.Crossref, Google Scholar
Schillaci Z (2024) On-site deployment of LLMs. Kucharavy A, Plancherel O, Mulder V, Mermoud A, Lenders V, eds. Large Language Models in Cybersecurity (Springer, Cham, Switzerland), 205–211.Crossref, Google Scholar
Shelby R, Rismani S, Henne K, Moon A, Rostamzadeh N, Nicholas P, Yilla-Akbari N, et al. (2023) Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction. Proc. 7th AAAI/ACM Conf. AI Ethics Society (ACM, New York), 723–741.Google Scholar
Tomašev N, Cornebise J, Hutter F, Mohamed S, Picciariello A, Connelly B, Belgrave DC, et al. (2020) AI for social good: Unlocking the opportunity for positive impact. Nature Comm. 11(1):2468.Crossref, Google Scholar
Ullah S, Han M, Pujar S, Pearce H, Coskun A, Stringhini G (2024) LLMs cannot reliably identify and reason about security vulnerabilities (yet?): A comprehensive evaluation, framework, and benchmarks. 2024 IEEE Sympos. Security Privacy (IEEE, Piscataway, NJ), 862–880.Google Scholar
Wang Y, Sun T, Yuan X, Li S, Ni W (2024) Minimizing adversarial training samples for robust image classifiers: Analysis and adversarial example generator design. IEEE Trans. Inform. Forensics Security 19:9613–9628.Crossref, Google Scholar
Wang W, Shomer H, Wan Y, Li Y, Huang J, Liu H (2023) A mix-up strategy to enhance adversarial training with imbalanced data. Proc. 32nd ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 2637–2645.Google Scholar
Wang W, Xu H, Liu X, Li Y, Thuraisingham B, Tang J (2022) Imbalanced adversarial training with reweighting. Proc. 22nd IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 1209–1214.Google Scholar
Wen Y, Zhang K, Li Z, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. Eur. Conf. Comput. Vision (Springer, Berlin), 499–515.Google Scholar
Xiao J, Tian Y, Jia Y, Jiang X, Yu L, Wang S (2023) Black-box attack-based security evaluation framework for credit card fraud detection models. INFORMS J. Comput. 35(5):986–1001.Link, Google Scholar
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. Proc. 2016 Conf. North Amer. Chapter Assoc. Comput. Linguistics Human Language Tech. (ACL, Stroudsburg, PA), 1480–1489.Google Scholar
Yuan L, Chen Y, Cui G, Gao H, Zou F, Cheng X, Ji H, Liu Z, Sun M (2024) Revisiting out-of-distribution robustness in NLP: Benchmarks, analysis, and LLMs evaluations. Proc. 36th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1–30.Google Scholar
Zhang Z, Luo L (2019) Hate speech detection: A solved problem? The challenging case of long tail on Twitter. Semantic Web 10(5):925–945.Google Scholar
Zhang X, Zheng X, Mao W (2021) Adversarial perturbation defense on deep neural networks. ACM Comput. Surveys 54(8):159.Google Scholar
Zhang X, Tian H, Zheng X, Peng J, Zeng DD (2026) Semantic aggregated adversarial training framework for hate speech detection. https://doi.org/10.1287/ijoc.2023.0508.cd, https://github.com/INFORMSJoC/2023.0508.Google Scholar
Zhang H, Yu Y, Jiao J, Xing E, El Ghaoui L, Jordan M (2019) Theoretically principled trade-off between robustness and accuracy. Proc. 36th Internat. Conf. Machine Learn. (PMLR, New York), 7472–7482.Google Scholar
Zhang J, Zhu J, Niu G, Han B, Sugiyama M, Kankanhalli M (2020) Geometry-aware instance-reweighted adversarial training. Proc. 8th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar
Zhu C, Cheng Y, Gan Z, Sun S, Goldstein T, Liu J (2020) FreeLB: Enhanced adversarial training for natural language understanding. Proc. 8th Internat. Conf. Learn. Representations (ICLR, Appleton, WI).Google Scholar

cover image INFORMS Journal on Computing

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Received:December 30, 2023
Accepted:April 02, 2026
Published Online:May 13, 2026

Cite as

Xingwei Zhang, Hu Tian, Xiaolong Zheng, Jing Peng, Daniel Dajun Zeng (2026) Semantic Aggregated Adversarial Training Framework for Hate Speech Detection. INFORMS Journal on Computing 0(0).

https://doi.org/10.1287/ijoc.2023.0508

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Semantic Aggregated Adversarial Training Framework for Hate Speech Detection

References

Articles In Advance

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News