Augmenting Social Bot Detection with Crowd-Generated Labels

Victor Benjamin
Corresponding Author
Victor Benjamin
[email protected]
https://orcid.org/0000-0002-1834-6064
Department of Information Systems, Arizona State University, Tempe, Arizona 85287
Search for more papers by this author
,
T. S. Raghu
T. S. Raghu
[email protected]
https://orcid.org/0000-0003-1071-6339
Department of Information Systems, Arizona State University, Tempe, Arizona 85287
Search for more papers by this author

Victor Benjamin

Corresponding Author

Victor Benjamin

[email protected]

https://orcid.org/0000-0002-1834-6064

Department of Information Systems, Arizona State University, Tempe, Arizona 85287

Search for more papers by this author

T. S. Raghu

[email protected]

https://orcid.org/0000-0003-1071-6339

Department of Information Systems, Arizona State University, Tempe, Arizona 85287

Search for more papers by this author

Published Online:12 May 2022https://doi.org/10.1287/isre.2022.1136

References

Abad ZSH, Butler GP, Thompson W, Lee J (2022) Crowdsourcing for machine learning in public health surveillance: Lessons learned from Amazon Mechanical Turk. J. Medical Internet Res. 24(1):e28749.Crossref, Google Scholar
Abbasi A, Zhou Y, Deng S, Zhang P (2018) Text analytics to support sense-making in social media: A language-action perspective. Management Inform. Systems Quart. 42(2):427–464.Crossref, Google Scholar
Abu-El-Rub N, Mueen A (2019) BotCamp: Bot-driven interactions in social campaigns. Liu L, White R, eds. Proc. World Wide Web Conf. (Association for Computing Machinery, New York), 2529–2535.Google Scholar
Agarwal R, Dhar V (2014) Big data, data science, and analytics: The opportunity and challenge for IS research. Inform. Systems Res. 25(3):443–448.Link, Google Scholar
Alexandersson J, Buschbeck-Wolf B, Fujinami T, Maier E, Reithinger N, Schmitz B, Siegel M (1997) Dialogue acts in VerbMobil-2. Verbmobil report, German Research Center for Artificial Intelligence, Kaiserslautem, Germany.Google Scholar
Austin JL (1962) How to Do Things with Words (Oxford University Press, Oxford, UK).Google Scholar
Benjamin V, Valacich J, Chen H (2019) DICE-E: Darknet identification, collection, evaluation, with ethics. Management Inform. Systems Quart. 43(1):1–22.Crossref, Google Scholar
Broniatowski DA, Jamison AM, Qi S, AlKulaib L, Chen T, Benton A, Quinn SC, et al. (2018) Weaponized health communication: Twitter bots and Russian trolls amplify the vaccine debate. Amer. J. Public Health 108(10):1378–1384.Crossref, Google Scholar
Cai C, Li L, Zengi D (2017) Behavior enhanced deep bot detection in social media. Zheng X, Zhang H, Xing C, Wang A, Zhou L, Luo B, eds. Proc. IEEE Internat. Conf. on Intelligence and Security Informatics (IEEE, New York), 128–130.Google Scholar
Cer D, Yang Y, Kong SY, Hua N, Limtiaco N, John RS, Constant N, et al. (2018) Universal sentence encoder for English. Proc. Conf. on Empirical Methods in Natural Language Processing: System Demonstrations, 169–174.Google Scholar
Chang JC, Amershi S, Kamar E (2017) Revolt: Collaborative crowdsourcing for labeling machine learning datasets. Mark G, Fussell S, Lampe C, Schraefel MC, Hourcade JP, Appert C, Wigdor D, eds. Proc. CHI Conf. on Human Factors in Comput. Systems (Association for Computing Machinery, New York), 2334–2346.Google Scholar
Chavoshi N, Hosseinm H, Abdullah M (2017) DeBot: Twitter bot detection via warped correlation. Domeniconi C, ed. Proc. IEEE Internat. Conf. on Data Mining (IEEE, New York), 817–822.Google Scholar
Cresci S, Di Pietro R, Petrocchi M, Spognardi A, Tesconi M (2017) The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race. Barrett R, Cummings R, Agichtein E, Gabrilovich E, eds. Proc. 26th Internat. Conf. on World Wide Web Companion (Association for Computing Machinery, New York), 963–972.Google Scholar
D’Andrade RG, Wish M (1985) Speech act theory in quantitative research on interpersonal behavior. Discourse Processing 8(2):229–258.Crossref, Google Scholar
Davis CA, Varol O, Ferrara E, Flammini A, Menczer F (2016) Botornot: A system to evaluate social bots. Bourdeau J, Hendler J, eds. Proc. 25th Internat. Conf. Companion on World Wide Web (Association for Computing Machinery, New York), 273–274.Google Scholar
de Lima Salge CA, Berente N (2017) Is that social bot behaving unethically? Comm. ACM 60(9):29–31.Crossref, Google Scholar
de Moor A, Aakhus M (2006) Argumentation support: From technologies to tools. Comm. ACM 49(3):93–98.Crossref, Google Scholar
Dong W, Liao S, Zhang Z (2018) Leveraging financial social media data for corporate fraud detection. J. Management Inform. Systems 35(2):461–487.Crossref, Google Scholar
Dutta H, Kwon KH, Rao H (2018) A system for intergroup prejudice detection: The case of microblogging under terrorist attacks. Decision Support Systems 113:11–21.Crossref, Google Scholar
Efthimion PG, Payne S, Proferes N (2018) Supervised machine learning bot detection techniques to identify social twitter bots. SMU Data Sci. Rev. 1(2):5.Google Scholar
Fernández A, Garcia S, Herrera F, Chawla NV (2018) SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artificial Intelligence Res. 61:863–905.Crossref, Google Scholar
Ferrara E (2018) Measuring social spam and the effect of bots on information diffusion in social media. Lehmann S, Ahn YY, eds. Complex Spreading Phenomena in Social Systems (Springer, Berlin), 229–255.Google Scholar
Ferrara E, Onur V, Davis C, Menczer F, Flammini A (2016) The rise of social bots. Comm. ACM. 59(7):96–104.Crossref, Google Scholar
Ferraz Costa A, Yamaguchi Y, Juci Machado Traina A, Traina Jr C, Faloutsos C (2015) Rsc: Mining and modeling temporal activity in social media. Cao L, Zhang C, Joachims T, Webb G, Margineantu DD, Williams G, eds. Proc. 21th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (Association for Computing Machinery, New York), 269–278.Google Scholar
Garcia-Silva A, Berrio C, Gómez-Pérez JM (2019) An empirical study on pre-trained embeddings and language models for bot detection. Augenstein I, Gella S, Ruder S, Kann K, Can B, Welbl J, Conneau A, Ren X, Rei M, eds. Proc. 4th Workshop on Representation Learn. for NLP (Association for Computiational Linguistics, Stroudsburg, PA), 148–155.Google Scholar
Goes PB (2014) Design science research in top information systems journals. MIS Quart. 38(1):iii–viii.Google Scholar
Goodman ND, Frank MC (2016) Pragmatic language interpretation as probabilistic inference. Trends Cognitive Sci. 20(11):818–829.Crossref, Google Scholar
Goodman ND, Stuhlmüller A (2013) Knowledge and implicature: Modeling language understanding as a social contagion. Top. Cognitive Sci. 5(1):173–184.Crossref, Google Scholar
Han H, Wang W, Mao B (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. Huang D-S, Zhang X-P, Huang G-B, eds. Proc. Internat. Conf. on Intelligence Comput. (Springer, New York), 878–887.Google Scholar
He H, Garcia A (2009) Learning from imbalanced data. IEEE Trans. Knowledge Data Engrg. 21(9):1263–1284.Crossref, Google Scholar
He S, Rui H, Whinston AB (2017) Social media strategies in product-harm crises. Inform. Systems Res. 29(2):362–380.Link, Google Scholar
Hegelich S, Janetzko D (2016) Are social bots on Twitter political actors? Empirical evidence from a Ukrainian social botnet. Gummadi KP, Strohmaier M, Gilbert E, Macy M, Wagner C, eds. Proc. 10th Internat. AAAI Conf. on Web and Social Media (AAAI, Palo Alto, CA), 579–582.Google Scholar
Heidari M, Jones JH (2020) Using bert to extract topic-independent sentiment features for social media bot detection. Vuong S, Chakrabarti S, Bradford P, Paul R, Rubenstein C, eds. Proc. 11th IEEE Annual Ubiquitous Comput., Electronics and Mobile Comm. Conf. (IEEE, New York), 0542–0547.Google Scholar
Hurtado S, Ray P, Marculescu R (2019) Bot detection in Reddit political discussion. Ramachandran GS, Ortiz J, eds. Proc. 4th Internat. Workshop on Social Sensing (Association for Computing Machinery, New York), 30–35.Google Scholar
Jain G, Sharma M, Agarwal B (2019) Spam detection in social media using convolutional and long short-term memory neural network. Ann. Math. Artificial Intelligence 85(1):21–44.Crossref, Google Scholar
Ji Y, He Y, Jiang X, Cao J, Li Q (2016) Combating the evasion mechanisms of social bots. Comput. Security 58:230–249.Crossref, Google Scholar
Kajino H, Tsuboi Y, Sato I, Kashima H (2012) Learning from crowds and experts. Hoffman J, Selman B, eds. Proc. Workshops at the 26th AAAI Conf. on Artificial Intelligence (AAAI, Palo Alto, CA).Google Scholar
Kiene C, Jiang JA, Hill BM (2019) Technological frames and user innovation: Exploring technological change in community moderation teams. Proc. ACM Human Comput. Interactions, 1–23.Google Scholar
Kudugunta S, Ferrara E (2018) Deep neural networks for bot detection. Inform. Sci. 467:312–322.Crossref, Google Scholar
Kuo FY, Yin CP (2011) A linguistic analysis of group support systems interactions for uncovering social realities of organizations. ACM Trans. MIS 2(1):1–21.Google Scholar
Lapowsky I (2017) It’s super hard to find the humans in the FCC’s net neutrality comments. Wired, https://www.wired.com/story/bots-form-letters-humans-fcc-net-neutrality-comments/.Google Scholar
Lasecki WS, Song YC, Kautz H, Bigham JP (2013) Real-time crowd labeling for deployable activity recognition. Bruckman A, Counts S, Lampe C, Terveen L, eds. Proc. Conf. Comput. Supported Cooperative Work (Association for Computing Machinery, New York), 1203–1212.Google Scholar
Leano R, Wang Z, Sarma A (2016) Labeling relevant skills in tasks: Can the crowd help? Blackwell A, Plimmer B, Stapleton G, eds. Proc. IEEE Sympos. on Visual Languages and Human-Centric Comput. (IEEE, New York), 185–189.Google Scholar
Levy O, Goldberg Y (2014) Neural word embedding as implicit matrix factorization. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Advances in Neural Information Processing Systems (MIT Press, Cambridge, MA), 2177–2185.Google Scholar
Liu Y, Liu Z, Chua TS, Sun M (2015) Topical word embeddings. Gunning D, Yeh PZ, eds. Proc. 29th AAAI Conf. on Artificial Intelligence (AAAI, Palo Alto, CA).Google Scholar
Liu X, Zhang B, Susarla A, Padman R (2020) Go to YouTube and call me in the morning: Use of social media for chronic conditions. Management Inform. Systems Quart. 44(1):257–283.Crossref, Google Scholar
Ludwig S, de Ruyter K (2016) Decoding social media speak: Developing a speech act theory research agenda. J. Consumer Marketing 33(2):124–134.Crossref, Google Scholar
Lyytinen K (1985) Implications of theories of language for IS. Management Inform. Systems Quart. 9(1):61–74.Crossref, Google Scholar
Maier-Hein L, Ross T, Gröhl J, Glocker B, Bodenstedt S, Stock C, Heim E, et al. (2016) Crowd-algorithm collaboration for large-scale endoscopic image annotation with confidence. Essert C, ed. Proc. Internat. Conf. on Medical Image Comput. and Comput.-Assisted Intervention (Springer, Cham, Switzerland), 616–623.Google Scholar
Melis L, Song C, De Cristofaro E, Shmatikov V (2019) Exploiting unintended feature leakage in collaborative learning. Gondree M, Kruegel C, Shacham H, eds. Proc. IEEE Sympos. on Security and Privacy (IEEE, New York), 691–706.Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. IEEE Computer Society, ed. Proc. Internat. Conf. on Learn. Representations (IEEE, New York).Google Scholar
Moldovan C, Rus V, Graesser AC (2011) Automated speech act classification for online chat. Visa S, Inoue A, Ralescu A, eds. MAICS (Midwest Artificial Intelligence and Cognitive Science, Cincinnati, OH), 23–29.Google Scholar
Nguyen-Dinh LV, Rossi M, Blanke U, Tröster G (2013) Combining crowd-generated media and personal data: Semi-supervised learning for context recognition. Singh VK, Chua T-S, Jain R, eds. Proc. 1st ACM Internat. Workshop on Personal Data Meets Distributed Multimedia (Association for Computing Machinery, New York), 35–38.Google Scholar
Oentaryo RJ, Murdopo A, Prasetyo PK, Lim EP (2016) On profiling bots in social media. Spiro E, Ahn Y-Y, eds. Proc. Internat. Conf. on Social Informatics (World Academy of Science, Engineering, and Technology), 92–109.Google Scholar
Olney A, Louwerse M, Mathews E, Marineau J, Hite-Mitchell H, Grasser A (2003) Utterance classification in AutoTutor. Proc. Human Language Tech.Google Scholar
Pozzana I, Ferrara E (2020) Measuring bot and human behavioral dynamics. Frontiers Phys. 8:125.Crossref, Google Scholar
Qadir A, Riloff E (2011) Classifying sentences as speech acts in message board posts. Barzilay R, Johnson M, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 748–758.Google Scholar
Rai A (2017) Editor’s comments: Diversity of design science research. Management Inform. Systems Quart. 41(1):iii–xviii.Google Scholar
Reimers N, Gurevych I, Reimers N, Gurevych I, Thakur N, Reimers N, Daxenberger J, et al. (2019) Sentence-BERT: Sentence embeddings using Siamese BERT-networks. Padó S, Huang R, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
Ruchansky N, Seo S, Liu Y (2017) CSI: A hybrid deep model for fake news detection. Lim E-P, Winslett M, Sanderson M, Fu A, Sun J, Culpepper S, Lo E, Ho J, Donato D, Agrawal R, Zheng Y, Castillo C, Sun A, Tseng VS, Li C, eds. Proc. ACM Conf. on Inform. and Knowledge Management (Association for Computing Machinery, New York), 797–806.Google Scholar
Searle JR (1969) Speech Acts: An Essay in the Philosophy of Language (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
Shi P, Zhang Z, Choo KKR (2019) Detecting malicious social bots based on clickstream sequences. IEEE Access 7:28885–28862.Google Scholar
Song K, Tan X, Qin T, Lu J, Liu TY (2019) MASS: Masked sequence to sequence pre-training for language generation. Xing E, Chaudhuri K, Salakhutdinov R, eds. Proc. Internat. Conf. on Machine Learn. (Association for Computing Machinery, New York), 5926–5936.Google Scholar
Stieglitz S, Brachten F, Ross B, Jung A-K (2017a) Do social bots dream of electric sheep? A categorisation of social media bot accounts. Jeffery R. ed. Proc. Australasian Conf. on Inform. Systems (Australasian Association for Information Systems, Melbourne, Australia), 1–11.Google Scholar
Stieglitz S, Brachten F, Berthel D, Schlaus M, Venetopoulou C, Veutgen D (2017b) Do social bots still act different to humans? Comparing metrics of social bots with those of humans. Jeffery R. ed. Proc. Internat. Conf. on Social Comput. and Social Media (Australasian Association for Information Systems, Melbourne, Australia), 379–395.Google Scholar
Subramanian S, Cohn T, Baldwin T (2019) Target based speech act classification in political campaign text. Mihalcea R, Shutova E, Ku L-W, Evang K, Poria S, eds. Proc. 8th Joint Conf. on Lexical and Computational Semantics (Association for Computiational Linguistics, Stroudsburg, PA), 273–282.Google Scholar
Subrahmanian V, Azaria A, Durst S, Kagan V, Galstyan A, Lerman K, Zhu L, Ferrara E, Flammini A, Menczer F (2016) The DARPA Twitter bot challenge. Computer 49(6):38–46.Crossref, Google Scholar
Sun C, Huang L, Qiu X (2019) Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. Burstein J, Doran C, Solorio T, eds. Proc. NAACL-HLT (Association for Computiational Linguistics, Stroudsburg, PA), 380–385.Google Scholar
Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. Navarro Mesa JL, Ortegoa A, Teixeira A, Perez EH, Morales PQ, Garcia AR, Moreno IG, Toledano DT, eds. Proc. 13th Annual Conf. of the Internat. Speech Comm. Assoc. (Elsevier, Amsterdam).Google Scholar
Vaast E, Safadi H, Lapointe L, Negoita B (2017) Social media affordances for connective action: An examination of microblogging use during the Gulf of Mexico oil spill. Management Inform. Systems Quart. 41(4):1179–1205.Crossref, Google Scholar
Varol O, Ferrara E, Davis CA, Menczer F, Flammini A (2017) Online human-bot interactions: Detection, estimation, and characterization. Ruths D, Mason W, Marwick A, Gonzalez-Bailon S, eds. Proc. 11th AAAI Conf. on Web and Social Media, (AAAI, Palo Alto, CA).Google Scholar
Wang Y (2017) A new concept using LSTM neural networks for dynamic system identification. Sun J, Rajamani R, eds. Proc. Amer. Control Conf. (IEEE, New York), 5324–5329.Google Scholar
Wei F, Nguyen UT (2019) Twitter bot detection using bidirectional long short-term memory neural networks and word embeddings. Joshi J, ed. Proc. 1st IEEE Internat. Conf. on Trust, Privacy and Security in Intelligent Systems and Appl. (IEEE, New York), 101–109.Google Scholar
Winograd T, Flores F (1986) Understanding Computers and Cognition (Abex, Norwood, NJ).Google Scholar
Wolf T, Chaumond J, Debut L, Sanh V, Delangue C, Moi A, Cistac P, et al. (2020) Transformers: State-of-the-art natural language processing. Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M, Davison J, Shleifer S, von Platen P, Ma C, Jernite Y, Plu J, Xu C, Le Scao T, Gugger S, Drame M, Lhoest Q, Rush A, eds. Proc. Conf. on Empirical Methods in Natural Language Processing: System Demonstrations (Association for Computiational Linguistics, Stroudsburg, PA), 38–45.Google Scholar
Yang KC, Varol O, Davis CA, Ferrara E, Flammini A, Menczer F (2019) Arming the public with artificial intelligence to counter social bots. Human Behav. Emerging Tech. 1(1).Google Scholar
Zhu Z, Blanke U, Tröster G (2016) Recognizing composite daily activities from crowd-labelled social media data. Pervasive Mobile Comput. 26:103–120.Crossref, Google Scholar

cover image Information Systems Research

Volume 34, Issue 2

June 2023

Pages iii-vii, 399-810, C2

Article Information

Supplemental Material

Metrics

Information

Received:July 22, 2019
Accepted:April 04, 2022
Published Online:May 12, 2022

Cite as

Victor Benjamin, T. S. Raghu (2022) Augmenting Social Bot Detection with Crowd-Generated Labels. Information Systems Research 34(2):487-507.

https://doi.org/10.1287/isre.2022.1136

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Augmenting Social Bot Detection with Crowd-Generated Labels

References

Volume 34, Issue 2

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News