sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics

Yi Yang
Yi Yang
[email protected]
Department of Information Systems, Business Statistics and Operations Management, Hong Kong University of Science and Technology, Hong Kong;
Search for more papers by this author
,
Kunpeng Zhang
Kunpeng Zhang
[email protected]
https://orcid.org/0000-0002-1474-3169
Department of Decision, Operations and Information Technologies, Robert H. Smith School of Business, University of Maryland, College Park, College Park, Maryland 20742;
Search for more papers by this author
,
Yangyang Fan
Yangyang Fan
[email protected]
School of Accounting and Finance, Faculty of Business, Hong Kong Polytechnic University, Hong Kong
Search for more papers by this author

Department of Information Systems, Business Statistics and Operations Management, Hong Kong University of Science and Technology, Hong Kong;

Search for more papers by this author

Kunpeng Zhang

[email protected]

https://orcid.org/0000-0002-1474-3169

Department of Decision, Operations and Information Technologies, Robert H. Smith School of Business, University of Maryland, College Park, College Park, Maryland 20742;

Search for more papers by this author

Yangyang Fan

[email protected]

School of Accounting and Finance, Faculty of Business, Hong Kong Polytechnic University, Hong Kong

Search for more papers by this author

Published Online:22 Mar 2022https://doi.org/10.1287/isre.2022.1124

References

Abbasi A, France S, Zhang Z, Chen H (2011) Selecting attributes for sentiment classification using feature relation networks. IEEE Trans. Knowledge Data Engrg. 23(3):447–462.Crossref, Google Scholar
Abrahams AS, Fan W, Wang GA, Zhang ZJ, Jiao J (2015) An integrated text analytic framework for product defect discovery. Production Oper. Management 24(6):975–990.Crossref, Google Scholar
Adhikari A, Ram A, Tang R, Lin J (2019) Rethinking complex neural network architectures for document classification. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Tech. (Association for Computational Linguistics, Stroudsburg, PA), 4046–4051.Google Scholar
Agarwal R, Dhar V (2014) Editorial—Big data, data science, and analytics: The opportunity and challenge for IS research. Inform. Systems Res. 25(3):443–448.Link, Google Scholar
Ahmad F, Abbasi A, Li J, Dobolyi DG, Netemeyer RG, Clifford GD, Chen H (2020) A deep learning architecture for psychometric natural language processing. ACM Trans. Inform. Systems 38(1):1–29.Crossref, Google Scholar
Baccianella S, Esuli A, Sebastiani F (2010) SentiWordNet 3.0: An enhanced lexical resource for sentiment analysis and opinion mining. Proc. Seventh Internat. Conf. Language Resources Evaluation (European Language Resources Association, Paris), 2200–2204.Google Scholar
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. Proc. Third Internat. Conf. Learn. Representations, San Diego.Google Scholar
Bao Y, Datta A (2014) Simultaneously discovering and quantifying risk types from textual risk disclosures. Management Sci. 60(6):1371–1391.Link, Google Scholar
Bellstam G, Bhagat S, Cookson JA (2021) A text-based analysis of corporate innovation. Management Sci. 67(7):4004–4031.Link, Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J. Machine Learn. Res. 3:993–1022.Google Scholar
Boyd-Graber J, Mimno D, Newman D (2014) Care and feeding of topic models: Problems, diagnostics, and improvements. Handbook of Mixed Membership Models and Their Applications (Chapman and Hall/CRC).Google Scholar
Büschken J, Allenby GM (2016) Sentence-based text analysis for customer reviews. Marketing Sci. 35(6):953–975.Link, Google Scholar
Cao Z, Li S, Liu Y, Li W, Ji H (2015) A novel neural topic model and its supervised extension. AAAI Conf. Artificial Intelligence (Association for the Advancement of Artificial Intelligence, Menlo Park, CA).Google Scholar
Chai Y, Li W (2019) Towards deep learning interpretability: A topic modeling approach. Proc. Internat. Conf. Inform. Systems (Association for Information Systems, Atlanta).Google Scholar
Chen J, He J, Shen Y, Xiao L, He X, Gao J, Song X, Deng L (2015) End-to-end learning of LDA by mirror-descent back propagation over a deep architecture. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1765–1773.Google Scholar
Cho K, van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. Proc. 2014 Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1724–1734.Google Scholar
Chong W, Blei D, Li FF (2009) Simultaneous image classification and annotation. Proc. 2009 IEEE Conf. Comput. Vision Pattern Recognition (Institute of Electrical and Electronics Engineers, Piscataway, NJ), 1903–1910.Google Scholar
Clark J, Provost F (2016) Matrix-factorization-based dimensionality reduction in the predictive modeling process: A design science perspective. Technical report, New York University, New York.Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Tech. (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
Dieng AB, Wang C, Gao J, Paisley J (2017) TopicRNN: A recurrent neural network with long-range semantic dependency. Internat. Conf. Learn. Representations.Google Scholar
Dong W, Liao S, Zhang Z (2018) Leveraging financial social media data for corporate fraud detection. J. Management Inform. Systems 35(2):461–487.Crossref, Google Scholar
Dyer T, Lang M, Stice-Lawrence L (2017) The evolution of 10-K textual disclosure: Evidence from latent Dirichlet allocation. J. Accounting Econom. 64(2–3):221–245.Crossref, Google Scholar
Geva H, Oestreicher-Singer G, Saar-Tsechansky M (2019) Using retweets when shaping our online persona: Topic modeling approach. MIS Quart. 43(2):501–524.Crossref, Google Scholar
Ghose A, Ipeirotis PG, Li B (2019) Modeling consumer footprints on search engines: An interplay with social media. Management Sci. 65(3):1363–1385.Link, Google Scholar
Godes D, Mayzlin D (2004) Using online conversations to study word-of-mouth communication. Marketing Sci. 23(4):545–560.Link, Google Scholar
Gong J, Abhishek V, Li B (2018) Examining the impact of keyword ambiguity on search advertising performance: A topic model approach. MIS Quart. 42(3):805–829.Crossref, Google Scholar
Guo X, Wei Q, Chen G, Zhang J, Qiao D (2017) Extracting representative information on intra-organizational blogging platforms. MIS Quart. 41(4):1105–1127.Crossref, Google Scholar
Huang AH, Lehavy R, Zang AY, Zheng R (2017) Analyst information discovery and interpretation roles: A topic modeling approach. Management Sci. 64(6):2833–2855.Link, Google Scholar
Hutto C, Gilbert E (2014) VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proc. Eighth Internat. AAAI Conf. Web Soc. Media (Association for the Advancement of Artificial Intelligence, Menlo Park, CA), 216–225.Google Scholar
Jones Q, Ravid G, Rafaeli S (2004) Information overload and the message dynamics of online interaction spaces: A theoretical model and empirical exploration. Inform. Systems Res. 15(2):194–210.Link, Google Scholar
Kaplan S, Vakili K (2015) The double-edged sword of recombination in breakthrough innovation. Strategic Management J. 36(10):1435–1457.Crossref, Google Scholar
Khern-am-nuai W, Kannan K, Ghasemkhani H (2018) Extrinsic versus intrinsic rewards for contributing reviews in an online platform. Inform. Systems Res. 29(4):871–892.Link, Google Scholar
Khurana S, Qiu L, Kumar S (2019) When a doctor knows, it shows: An empirical analysis of doctors’ responses in a Q&A forum of an online healthcare portal. Inform. Systems Res. 30(3):872–891.Link, Google Scholar
Kingma DP, Welling M (2013) Auto-encoding variational Bayes. Preprint, submitted December 20, https://arxiv.org/abs/1312.6114.Google Scholar
Kokkodis M, Lappas T, Ransbotham S (2020) From lurkers to workers: Predicting voluntary contribution and community welfare. Inform. Systems Res. 31(2):607–626.Link, Google Scholar
Lacoste-Julien S, Sha F, Jordan MI (2009) DiscLDA: Discriminative learning for dimensionality reduction and classification. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Proc. 21st Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 897–904.Google Scholar
Lappas T, Sabnis G, Valkanas G (2016) The impact of fake reviews on online visibility: A vulnerability assessment of the hotel industry. Inform. Systems Res. 27(4):940–961.Link, Google Scholar
Larsen KR, Bong CH (2016) A tool for addressing construct identity in literature reviews and meta-analyses. MIS Quart. 40(3):529–551.Crossref, Google Scholar
Lee SY, Qiu L, Whinston A (2018) Sentiment manipulation in online platforms: An analysis of movie tweets. Production Oper. Management 27(3):393–416.Crossref, Google Scholar
Lin M, Lucas HC, Shmueli G (2013) Research commentary—Too big to fail: Large samples and the p-value problem. Inform. Systems Res. 24(4):906–917.Link, Google Scholar
Liu X, Singh PV, Srinivasan K (2016) A structured analysis of unstructured big data by leveraging cloud computing. Marketing Sci. 35(3):363–388.Link, Google Scholar
Liu X, Wang GA, Fan W, Zhang Z (2020) Finding useful solutions in online knowledge communities: A theory-driven design and multilevel analysis. Inform. Systems Res. 31(3):731–752.Link, Google Scholar
Mankad S, Hu S, Gopal A (2018) Single stage prediction with embedded topic modeling of online reviews for mobile app management. Ann. Appl. Statist. 12(4):2279–2311.Crossref, Google Scholar
Mankad S, Han HS, Goh J, Gavirneni S (2016) Understanding online hotel reviews through automated text analysis. Service Sci. 8(2):124–138.Link, Google Scholar
Martens D, Provost F (2014) Explaining data-driven document classifications. MIS Quart. 38(1):73–100.Crossref, Google Scholar
Mcauliffe JD, Blei DM (2008) Supervised topic models. Platt J, Koller D, Singer Y, Roweis S, eds. Pro. 20th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 121–128.Google Scholar
Miao Y, Grefenstette E, Blunsom P (2017) Discovering discrete latent topics with neural variational inference. Proc. 34th Internat. Conf. Machine Learn., 2410–2419.Google Scholar
Peng CH, Yin D, Zhang H (2020) More than words in medical question-and-answer sites: A content-context congruence perspective. Inform. Systems Res. 31(3):913–928.Link, Google Scholar
Puranam D, Narayan V, Kadiyali V (2017) The effect of calorie posting regulation On consumer opinion: A flexible latent Dirichlet allocation model with informative priors. Marketing Sci. 36(5):726–746.Link, Google Scholar
Qiao M, Huang KW (2021) Correcting misclassification bias in regression models with variables generated via data mining. Inform. Systems Res. 32(2):462–480.Link, Google Scholar
Rai A (2016) Editor’s comments: Synergies between big data and theory. MIS Quart. 40(2):iii–ix.Crossref, Google Scholar
Rehurek R, Sojka P (2010) Software framework for topic modelling with large corpora. Proc. LREC 2010 Workshop New Challenges NLP Frameworks (European Language Resources Association, Paris).Google Scholar
Sanh V, Debut L, Chaumond J, Wolf T (2019) DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. Preprint, submitted October 2, https://arxiv.org/abs/1910.01108.Google Scholar
Shi T, Zhu J (2017) Online Bayesian passive-aggressive learning. J. Machine Learn. Res. 18(1):1084–1122.Google Scholar
Shi Z, Lee GM, Whinston AB (2016) Toward a better measure of business proximity: Topic modeling for industry intelligence. MIS Quart. 40(4):1035–1056.Crossref, Google Scholar
Shin D, He S, Lee GM, Whinston AB, Cetintas S, Lee KC (2020) Enhancing social media analysis with visual data analytics: A deep learning approach. MIS Quart. 44(4):1459–1492.Crossref, Google Scholar
Singh PV, Sahoo N, Mukhopadhyay T (2014) How to attract and retain readers in enterprise blogging? Inform. Systems Res. 25(1):35–52.Link, Google Scholar
Sun C, Qiu X, Xu Y, Huang X (2019) How to fine-tune BERT for text classification? Preprint, submitted May 14, https://arxiv.org/abs/1905.05583.Google Scholar
Tirunillai S, Tellis GJ (2014) Mining marketing meaning from online chatter: Strategic brand analysis of big data using latent Dirichlet allocation. J. Marketing Res. 51(4):463–479.Crossref, Google Scholar
Toubia O, Netzer O (2016) Idea generation, creativity, and prototypicality. Marketing Sci. 36(1):1–20.Link, Google Scholar
Wang Y, Chaudhry A (2018) When and how managers’ responses to online reviews affect subsequent reviews. J. Marketing Res. 55(2):163–177.Crossref, Google Scholar
Wang X, Yang Y (2020) Neural topic model with attention for supervised learning. Proc. 23rd Internat. Conf. Artificial Intelligence Statist., vol. 108 (PMLR), 1147–1156.Google Scholar
Wang Y, Zhu J (2014) Spectral methods for supervised topic models. Ghahramani Z, Welling M, Cortes C, Lawrence N, Weinberger KQ, eds. Proc. 27th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1511–1519.Google Scholar
Wang Q, Li B, Singh PV (2018) Copycats vs. original mobile apps: A machine learning copycat-detection method and empirical analysis. Inform. Systems Res. 29(2):273–291.Link, Google Scholar
Xu L, Nian T, Cabral L (2020) What makes geeks tick? A study of Stack Overflow careers. Management Sci. 66(2):587–604.Link, Google Scholar
Yang M, Adomavicius G, Burtch G, Ren Y (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.Link, Google Scholar
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. Proc. 2016 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Tech. (Association for Computational Linguistics, Stroudsburg, PA), 1480–1489.Google Scholar
Yue WT, Wang Q, Hui KL (2019) See no evil, hear no evil? Dissecting the impact of online hacker forums. MIS Quart. 43(1):73–95.Crossref, Google Scholar
Zhu J, Ahmed A, Xing EP (2012) MedLDA: Maximum margin supervised topic models. J. Machine Learn. Res. 13(74):2237–2278.Google Scholar

cover image Information Systems Research

Volume 34, Issue 1

March 2023

Pages iii-vii, 1-397, C2

Article Information

Supplemental Material

Metrics

Information

Received:May 27, 2020
Accepted:February 07, 2022
Published Online:March 22, 2022

Cite as

Yi Yang, Kunpeng Zhang, Yangyang Fan (2022) sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics. Information Systems Research 34(1):137-156.

https://doi.org/10.1287/isre.2022.1124

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

sDTM: A Supervised Bayesian Deep Topic Model for Text Analytics

References

Volume 34, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News