Augmenting Social Bot Detection with Crowd-Generated Labels

Published Online:https://doi.org/10.1287/isre.2022.1136

References

  • Abad ZSH, Butler GP, Thompson W, Lee J (2022) Crowdsourcing for machine learning in public health surveillance: Lessons learned from Amazon Mechanical Turk. J. Medical Internet Res. 24(1):e28749.CrossrefGoogle Scholar
  • Abbasi A, Zhou Y, Deng S, Zhang P (2018) Text analytics to support sense-making in social media: A language-action perspective. Management Inform. Systems Quart. 42(2):427–464.CrossrefGoogle Scholar
  • Abu-El-Rub N, Mueen A (2019) BotCamp: Bot-driven interactions in social campaigns. Liu L, White R, eds. Proc. World Wide Web Conf. (Association for Computing Machinery, New York), 2529–2535.Google Scholar
  • Agarwal R, Dhar V (2014) Big data, data science, and analytics: The opportunity and challenge for IS research. Inform. Systems Res. 25(3):443–448.LinkGoogle Scholar
  • Alexandersson J, Buschbeck-Wolf B, Fujinami T, Maier E, Reithinger N, Schmitz B, Siegel M (1997) Dialogue acts in VerbMobil-2. Verbmobil report, German Research Center for Artificial Intelligence, Kaiserslautem, Germany.Google Scholar
  • Austin JL (1962) How to Do Things with Words (Oxford University Press, Oxford, UK).Google Scholar
  • Benjamin V, Valacich J, Chen H (2019) DICE-E: Darknet identification, collection, evaluation, with ethics. Management Inform. Systems Quart. 43(1):1–22.CrossrefGoogle Scholar
  • Broniatowski DA, Jamison AM, Qi S, AlKulaib L, Chen T, Benton A, Quinn SC, et al. (2018) Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Amer. J. Public Health 108(10):1378–1384.CrossrefGoogle Scholar
  • Cai C, Li L, Zengi D (2017) Behavior enhanced deep bot detection in social media. Zheng X, Zhang H, Xing C, Wang A, Zhou L, Luo B, eds. Proc. IEEE Internat. Conf. on Intelligence and Security Informatics (IEEE, New York), 128–130.Google Scholar
  • Cer D, Yang Y, Kong SY, Hua N, Limtiaco N, John RS, Constant N, et al. (2018) Universal sentence encoder for English. Proc. Conf. on Empirical Methods in Natural Language Processing: System Demonstrations, 169–174.Google Scholar
  • Chang JC, Amershi S, Kamar E (2017) Revolt: Collaborative crowdsourcing for labeling machine learning datasets. Mark G, Fussell S, Lampe C, Schraefel MC, Hourcade JP, Appert C, Wigdor D, eds. Proc. CHI Conf. on Human Factors in Comput. Systems (Association for Computing Machinery, New York), 2334–2346.Google Scholar
  • Chavoshi N, Hosseinm H, Abdullah M (2017) DeBot: Twitter bot detection via warped correlation. Domeniconi C, ed. Proc. IEEE Internat. Conf. on Data Mining (IEEE, New York), 817–822.Google Scholar
  • Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. Barrett R, Cummings R, Agichtein E, Gabrilovich E, eds. Proc. 26th Internat. Conf. on World Wide Web Companion (Association for Computing Machinery, New York), 963–972.Google Scholar
  • D’Andrade RG, Wish M (1985) Speech act theory in quantitative research on interpersonal behavior. Discourse Processing 8(2):229–258.CrossrefGoogle Scholar
  • Davis CA, Varol O, Ferrara E, Flammini A, Menczer F (2016) Botornot: A system to evaluate social bots. Bourdeau J, Hendler J, eds. Proc. 25th Internat. Conf. Companion on World Wide Web (Association for Computing Machinery, New York), 273–274.Google Scholar
  • de Lima Salge CA, Berente N (2017) Is that social bot behaving unethically? Comm. ACM 60(9):29–31.CrossrefGoogle Scholar
  • de Moor A, Aakhus M (2006) Argumentation support: From technologies to tools. Comm. ACM 49(3):93–98.CrossrefGoogle Scholar
  • Dong W, Liao S, Zhang Z (2018) Leveraging financial social media data for corporate fraud detection. J. Management Inform. Systems 35(2):461–487.CrossrefGoogle Scholar
  • Dutta H, Kwon KH, Rao H (2018) A system for intergroup prejudice detection: The case of microblogging under terrorist attacks. Decision Support Systems 113:11–21.CrossrefGoogle Scholar
  • Efthimion PG, Payne S, Proferes N (2018) Supervised machine learning bot detection techniques to identify social twitter bots. SMU Data Sci. Rev. 1(2):5.Google Scholar
  • Fernández A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artificial Intelligence Res. 61:863–905.CrossrefGoogle Scholar
  • Ferrara E (2018) Measuring social spam and the effect of bots on information diffusion in social media. Lehmann S, Ahn YY, eds. Complex Spreading Phenomena in Social Systems (Springer, Berlin), 229–255.Google Scholar
  • Ferrara E, Onur V, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Comm. ACM. 59(7):96–104.CrossrefGoogle Scholar
  • Ferraz Costa A, Yamaguchi Y, Juci Machado Traina A, Traina Jr C, Faloutsos C (2015) Rsc: Mining and modeling temporal activity in social media. Cao L, Zhang C, Joachims T, Webb G, Margineantu DD, Williams G, eds. Proc. 21th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (Association for Computing Machinery, New York), 269–278.Google Scholar
  • Garcia-Silva A, Berrio C, Gómez-Pérez JM (2019) An empirical study on pre-trained embeddings and language models for bot detection. Augenstein I, Gella S, Ruder S, Kann K, Can B, Welbl J, Conneau A, Ren X, Rei M, eds. Proc. 4th Workshop on Representation Learn. for NLP (Association for Computiational Linguistics, Stroudsburg, PA), 148–155.Google Scholar
  • Goes PB (2014) Design science research in top information systems journals. MIS Quart. 38(1):iii–viii.Google Scholar
  • Goodman ND, Frank MC (2016) Pragmatic language interpretation as probabilistic inference. Trends Cognitive Sci. 20(11):818–829.CrossrefGoogle Scholar
  • Goodman ND, Stuhlmüller A (2013) Knowledge and implicature: Modeling language understanding as a social contagion. Top. Cognitive Sci. 5(1):173–184.CrossrefGoogle Scholar
  • Han H, Wang W, Mao B (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. Huang D-S, Zhang X-P, Huang G-B, eds. Proc. Internat. Conf. on Intelligence Comput. (Springer, New York), 878–887.Google Scholar
  • He H, Garcia A (2009) Learning from imbalanced data. IEEE Trans. Knowledge Data Engrg. 21(9):1263–1284.CrossrefGoogle Scholar
  • He S, Rui H, Whinston AB (2017) Social media strategies in product-harm crises. Inform. Systems Res. 29(2):362–380.LinkGoogle Scholar
  • Hegelich S, Janetzko D (2016) Are social bots on Twitter political actors? Empirical evidence from a Ukrainian social botnet. Gummadi KP, Strohmaier M, Gilbert E, Macy M, Wagner C, eds. Proc. 10th Internat. AAAI Conf. on Web and Social Media (AAAI, Palo Alto, CA), 579–582.Google Scholar
  • Heidari M, Jones JH (2020) Using bert to extract topic-independent sentiment features for social media bot detection. Vuong S, Chakrabarti S, Bradford P, Paul R, Rubenstein C, eds. Proc. 11th IEEE Annual Ubiquitous Comput., Electronics and Mobile Comm. Conf. (IEEE, New York), 0542–0547.Google Scholar
  • Hurtado S, Ray P, Marculescu R (2019) Bot detection in Reddit political discussion. Ramachandran GS, Ortiz J, eds. Proc. 4th Internat. Workshop on Social Sensing (Association for Computing Machinery, New York), 30–35.Google Scholar
  • Jain G, Sharma M, Agarwal B (2019) Spam detection in social media using convolutional and long short-term memory neural network. Ann. Math. Artificial Intelligence 85(1):21–44.CrossrefGoogle Scholar
  • Ji Y, He Y, Jiang X, Cao J, Li Q (2016) Combating the evasion mechanisms of social bots. Comput. Security 58:230–249.CrossrefGoogle Scholar
  • Kajino H, Tsuboi Y, Sato I, Kashima H (2012) Learning from crowds and experts. Hoffman J, Selman B, eds. Proc. Workshops at the 26th AAAI Conf. on Artificial Intelligence (AAAI, Palo Alto, CA).Google Scholar
  • Kiene C, Jiang JA, Hill BM (2019) Technological frames and user innovation: Exploring technological change in community moderation teams. Proc. ACM Human Comput. Interactions, 1–23.Google Scholar
  • Kudugunta S, Ferrara E (2018) Deep neural networks for bot detection. Inform. Sci. 467:312–322.CrossrefGoogle Scholar
  • Kuo FY, Yin CP (2011) A linguistic analysis of group support systems interactions for uncovering social realities of organizations. ACM Trans. MIS 2(1):1–21.Google Scholar
  • Lapowsky I (2017) It’s super hard to find the humans in the FCC’s net neutrality comments. Wired, https://www.wired.com/story/bots-form-letters-humans-fcc-net-neutrality-comments/.Google Scholar
  • Lasecki WS, Song YC, Kautz H, Bigham JP (2013) Real-time crowd labeling for deployable activity recognition. Bruckman A, Counts S, Lampe C, Terveen L, eds. Proc. Conf. Comput. Supported Cooperative Work (Association for Computing Machinery, New York), 1203–1212.Google Scholar
  • Leano R, Wang Z, Sarma A (2016) Labeling relevant skills in tasks: Can the crowd help? Blackwell A, Plimmer B, Stapleton G, eds. Proc. IEEE Sympos. on Visual Languages and Human-Centric Comput. (IEEE, New York), 185–189.Google Scholar
  • Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA), 2177–2185.Google Scholar
  • Liu Y, Liu Z, Chua TS, Sun M (2015) Topical word embeddings. Gunning D, Yeh PZ, eds. Proc. 29th AAAI Conf. on Artificial Intelligence (AAAI, Palo Alto, CA).Google Scholar
  • Liu X, Zhang B, Susarla A, Padman R (2020) Go to YouTube and call me in the morning: Use of social media for chronic conditions. Management Inform. Systems Quart. 44(1):257–283.CrossrefGoogle Scholar
  • Ludwig S, de Ruyter K (2016) Decoding social media speak: Developing a speech act theory research agenda. J. Consumer Marketing 33(2):124–134.CrossrefGoogle Scholar
  • Lyytinen K (1985) Implications of theories of language for IS. Management Inform. Systems Quart. 9(1):61–74.CrossrefGoogle Scholar
  • Maier-Hein L, Ross T, Gröhl J, Glocker B, Bodenstedt S, Stock C, Heim E, et al. (2016) Crowd-algorithm collaboration for large-scale endoscopic image annotation with confidence. Essert C, ed. Proc. Internat. Conf. on Medical Image Comput. and Comput.-Assisted Intervention (Springer, Cham, Switzerland), 616–623.Google Scholar
  • Melis L, Song C, De Cristofaro E, Shmatikov V (2019) Exploiting unintended feature leakage in collaborative learning. Gondree M, Kruegel C, Shacham H, eds. Proc. IEEE Sympos. on Security and Privacy (IEEE, New York), 691–706.Google Scholar
  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. IEEE Computer Society, ed. Proc. Internat. Conf. on Learn. Representations (IEEE, New York).Google Scholar
  • Moldovan C, Rus V, Graesser AC (2011) Automated speech act classification for online chat. Visa S, Inoue A, Ralescu A, eds. MAICS (Midwest Artificial Intelligence and Cognitive Science, Cincinnati, OH), 23–29.Google Scholar
  • Nguyen-Dinh LV, Rossi M, Blanke U, Tröster G (2013) Combining crowd-generated media and personal data: Semi-supervised learning for context recognition. Singh VK, Chua T-S, Jain R, eds. Proc. 1st ACM Internat. Workshop on Personal Data Meets Distributed Multimedia (Association for Computing Machinery, New York), 35–38.Google Scholar
  • Oentaryo RJ, Murdopo A, Prasetyo PK, Lim EP (2016) On profiling bots in social media. Spiro E, Ahn Y-Y, eds. Proc. Internat. Conf. on Social Informatics (World Academy of Science, Engineering, and Technology), 92–109.Google Scholar
  • Olney A, Louwerse M, Mathews E, Marineau J, Hite-Mitchell H, Grasser A (2003) Utterance classification in AutoTutor. Proc. Human Language Tech.Google Scholar
  • Pozzana I, Ferrara E (2020) Measuring bot and human behavioral dynamics. Frontiers Phys. 8:125.CrossrefGoogle Scholar
  • Qadir A, Riloff E (2011) Classifying sentences as speech acts in message board posts. Barzilay R, Johnson M, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 748–758.Google Scholar
  • Rai A (2017) Editor’s comments: Diversity of design science research. Management Inform. Systems Quart. 41(1):iii–xviii.Google Scholar
  • Reimers N, Gurevych I, Reimers N, Gurevych I, Thakur N, Reimers N, Daxenberger J, et al. (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Padó S, Huang R, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
  • Ruchansky N, Seo S, Liu Y (2017) CSI: A hybrid deep model for fake news detection. Lim E-P, Winslett M, Sanderson M, Fu A, Sun J, Culpepper S, Lo E, Ho J, Donato D, Agrawal R, Zheng Y, Castillo C, Sun A, Tseng VS, Li C, eds. Proc. ACM Conf. on Inform. and Knowledge Management (Association for Computing Machinery, New York), 797–806.Google Scholar
  • Searle JR (1969) Speech Acts: An Essay in the Philosophy of Language (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Shi P, Zhang Z, Choo KKR (2019) Detecting malicious social bots based on clickstream sequences. IEEE Access 7:28885–28862.Google Scholar
  • Song K, Tan X, Qin T, Lu J, Liu TY (2019) MASS: Masked sequence to sequence pre-training for language generation. Xing E, Chaudhuri K, Salakhutdinov R, eds. Proc. Internat. Conf. on Machine Learn. (Association for Computing Machinery, New York), 5926–5936.Google Scholar
  • Stieglitz S, Brachten F, Ross B, Jung A-K (2017a) Do social bots dream of electric sheep? A categorisation of social media bot accounts. Jeffery R. ed. Proc. Australasian Conf. on Inform. Systems (Australasian Association for Information Systems, Melbourne, Australia), 1–11.Google Scholar
  • Stieglitz S, Brachten F, Berthel D, Schlaus M, Venetopoulou C, Veutgen D (2017b) Do social bots still act different to humans? Comparing metrics of social bots with those of humans. Jeffery R. ed. Proc. Internat. Conf. on Social Comput. and Social Media (Australasian Association for Information Systems, Melbourne, Australia), 379–395.Google Scholar
  • Subramanian S, Cohn T, Baldwin T (2019) Target based speech act classification in political campaign text. Mihalcea R, Shutova E, Ku L-W, Evang K, Poria S, eds. Proc. 8th Joint Conf. on Lexical and Computational Semantics (Association for Computiational Linguistics, Stroudsburg, PA), 273–282.Google Scholar
  • Subrahmanian V, Azaria A, Durst S, Kagan V, Galstyan A, Lerman K, Zhu L, Ferrara E, Flammini A, Menczer F (2016) The DARPA Twitter bot challenge. Computer 49(6):38–46.CrossrefGoogle Scholar
  • Sun C, Huang L, Qiu X (2019) Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. Burstein J, Doran C, Solorio T, eds. Proc. NAACL-HLT (Association for Computiational Linguistics, Stroudsburg, PA), 380–385.Google Scholar
  • Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. Navarro Mesa JL, Ortegoa A, Teixeira A, Perez EH, Morales PQ, Garcia AR, Moreno IG, Toledano DT, eds. Proc. 13th Annual Conf. of the Internat. Speech Comm. Assoc. (Elsevier, Amsterdam).Google Scholar
  • Vaast E, Safadi H, Lapointe L, Negoita B (2017) Social media affordances for connective action: An examination of microblogging use during the Gulf of Mexico oil spill. Management Inform. Systems Quart. 41(4):1179–1205.CrossrefGoogle Scholar
  • Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: Detection, estimation, and characterization. Ruths D, Mason W, Marwick A, Gonzalez-Bailon S, eds. Proc. 11th AAAI Conf. on Web and Social Media, (AAAI, Palo Alto, CA).Google Scholar
  • Wang Y (2017) A new concept using LSTM neural networks for dynamic system identification. Sun J, Rajamani R, eds. Proc. Amer. Control Conf. (IEEE, New York), 5324–5329.Google Scholar
  • Wei F, Nguyen UT (2019) Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings. Joshi J, ed. Proc. 1st IEEE Internat. Conf. on Trust, Privacy and Security in Intelligent Systems and Appl. (IEEE, New York), 101–109.Google Scholar
  • Winograd T, Flores F (1986) Understanding Computers and Cognition (Abex, Norwood, NJ).Google Scholar
  • Wolf T, Chaumond J, Debut L, Sanh V, Delangue C, Moi A, Cistac P, et al. (2020) Transformers: State-of-the-art natural language processing. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Le Scao T, Gugger S, Drame M, Lhoest Q, Rush A, eds. Proc. Conf. on Empirical Methods in Natural Language Processing: System Demonstrations (Association for Computiational Linguistics, Stroudsburg, PA), 38–45.Google Scholar
  • Yang KC, Varol O, Davis CA, Ferrara E, Flammini A, Menczer F (2019) Arming the public with artificial intelligence to counter social bots. Human Behav. Emerging Tech. 1(1).Google Scholar
  • Zhu Z, Blanke U, Tröster G (2016) Recognizing composite daily activities from crowd-labelled social media data. Pervasive Mobile Comput. 26:103–120.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.