Guided Diverse Concept Miner (GDCM): Uncovering Relevant Constructs for Managerial Insights from Text

Dokyun “DK” Lee
Corresponding Author
Dokyun “DK” Lee
[email protected]
https://orcid.org/0000-0002-3186-3349
Questrom School of Business, Boston University, Boston, Massachusetts 02215
Search for more papers by this author
,
Zhaoqi “ZQ” Cheng
Zhaoqi “ZQ” Cheng
[email protected]
https://orcid.org/0000-0003-3295-0484
Questrom School of Business, Boston University, Boston, Massachusetts 02215
Search for more papers by this author
,
Chengfeng Mao
Chengfeng Mao
[email protected]
https://orcid.org/0009-0002-6679-7764
Marketing, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142
Search for more papers by this author
,
Emaad Manzoor
Emaad Manzoor
[email protected]
Marketing, Samuel Curtis Johnson Graduate School of Management, Cornell University, Ithaca, New York 14853
Search for more papers by this author

Dokyun “DK” Lee

Corresponding Author

Dokyun “DK” Lee

[email protected]

https://orcid.org/0000-0002-3186-3349

Questrom School of Business, Boston University, Boston, Massachusetts 02215

Search for more papers by this author

Zhaoqi “ZQ” Cheng

[email protected]

https://orcid.org/0000-0003-3295-0484

Questrom School of Business, Boston University, Boston, Massachusetts 02215

Search for more papers by this author

Chengfeng Mao

[email protected]

https://orcid.org/0009-0002-6679-7764

Marketing, Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142

Search for more papers by this author

Emaad Manzoor

[email protected]

Marketing, Samuel Curtis Johnson Graduate School of Management, Cornell University, Ithaca, New York 14853

Search for more papers by this author

Published Online:10 May 2024https://doi.org/10.1287/isre.2020.0494

References

Abbasi A, Zhou Y, Deng S, Zhang P (2018) Text analytics to support sense-making in social media: A language-action perspective. MIS Quart. 42(2):427–464.Crossref, Google Scholar
Abbasi A, Li J, Adjeroh D, Abate M, Zheng W (2019) Don’t mention it? Analyzing user-generated content signals for early adverse event warnings. Inform. Systems Res. 30(3):1007–1028.Link, Google Scholar
Abrahams AS, Fan W, Wang GA, Zhang Z, Jiao J (2015) An integrated text analytic framework for product defect discovery. Production Oper. Management 24(6):975–990.Crossref, Google Scholar
Airoldi EM, Bischof JM (2016) Improving and evaluating topic models and other models of text. J. Amer. Statist. Assoc. 111(516):1381–1403.Crossref, Google Scholar
Archak N, Ghose A, Ipeirotis PG (2011) Deriving the pricing power of product features by mining consumer reviews. Management Sci. 57(8):1485–1509.Link, Google Scholar
Bass FM (1995) Empirical generalizations and marketing science: A personal view. Marketing Sci. 14(3 Suppl):G6–G19.Link, Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J. Machine Learn. Res. 3(January):993–1022.Google Scholar
Caballero AJ (2023) Document topic extraction with large language models (LLM) and the latent Dirichlet allocation (LDA) algorithm. Accessed April 17, 2024, https://towardsdatascience.com/document-topic-extraction-with-large-language-models-llm-and-the-latent-dirichlet-allocation-e4697e4dae87.Google Scholar
Carey S (2009) The Origin of Concepts (Oxford University Press, Oxford, UK).Crossref, Google Scholar
Chai Y, Li W (2019) Toward deep learning interpretability: A topic modeling approach. Proc. Internat. Conf. Inform. Systems (Association for Information Systems, Atlanta).Google Scholar
Chang J, Boyd-Graber JL, Gerrish S, Wang C, Blei DM (2009) Reading tea leaves: How humans interpret topic models. Bengio Y, Schuurmans D, Lafferty JD, Williams CKI, Culotta A, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 288–296.Google Scholar
Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 785–794.Google Scholar
Chen W, Gu B, Ye Q, Zhu KX (2019) Measuring and managing the externality of managerial responses to online customer reviews. Inform. Systems Res. 30(1):81–96.Link, Google Scholar
Chen J, He J, Shen Y, Xiao L, He X, Gao J, Song X, Deng L (2015) End-to-end learning of LDA by mirror-descent back propagation over a deep architecture. Cortes C, Lawrence N, Lee D, Sugiyama M, Garnett R, eds. Proc. 28th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1765–1773.Google Scholar
Choi AA, Cho D, Yim D, Moon JY, Oh W (2019) When seeing helps believing: The interactive effects of previews and reviews on e-book purchases. Inform. Systems Res. 30(4):1164–1183.Link, Google Scholar
Clemons EK, Gao GG, Hitt LM (2006) When online reviews meet hyperdifferentiation: A study of the craft beer industry. J. Management Inform. Systems 23(2):149–171.Crossref, Google Scholar
Dhurandhar A, Iyengar V, Luss R, Shanmugam K (2017) Tip: Typifying the interpretability of procedures. Preprint, submitted June 9, https://arxiv.org/abs/1706.02952.Google Scholar
Dieng AB, Ruiz FJ, Blei DM (2020) Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguistics 8:439–453.Crossref, Google Scholar
Efron B, Tibshirani R (1997) Improvements on cross-validation: The.632+ bootstrap method. J. Amer. Statist. Assoc. 92(438):548–560.Google Scholar
Feifer J (2013) The Amazon whisperer. Accessed April 17, 2024, https://www.fastcompany.com/3021229/chaim-pikarski-the-amazon-whisperer.Google Scholar
Gardenfors P (2004) Conceptual Spaces: The Geometry of Thought (MIT Press, Cambridge, MA).Google Scholar
Garvin DA (1984) What does “product quality” really mean? MIT Sloan Management Rev. (October 15), https://sloanreview.mit.edu/article/what-does-product-quality-really-mean/.Google Scholar
Garvin DA (1987) Competing on the 8 dimensions of quality. Harvard Bus. Rev. 65(6):101–109.Google Scholar
Goldstone RL, Son JY (2005) Similarity. Holyoak KJ, Morrison RG, eds. The Cambridge Handbook of Thinking and Reasoning (Cambridge University Press, Cambridge, UK), 13–36.Google Scholar
Grootendorst M (2022) BERTopic: Neural topic modeling with a class-based TF-IDF procedure. Preprint, submitted March 11, https://arxiv.org/abs/2203.05794.Google Scholar
Grootendorst M (2023) Topic modeling with Llama 2. Accessed April 17, 2024, https://towardsdatascience.com/topic-modeling-with-llama-2-85177d01e174.Google Scholar
Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D (2018) A survey of methods for explaining black box models. ACM Comput. Surveys 51(5):1–42.Crossref, Google Scholar
Han S, Shin M, Park S, Jung C, Cha M (2023) Unified neural topic model via contrastive learning and term weighting. Vlachos A, Augenstein I, eds. Proc. 17th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 1802–1817.Google Scholar
Harris ZS (1954) Distributional structure. Word 10(2–3):146–162.Crossref, Google Scholar
Huang S, Tran TD (2018) Sparse signal recovery via generalized entropy functions minimization. IEEE Trans. Signal Processing 67(5):1322–1337.Crossref, Google Scholar
Jackendoff R (1989) What is a concept, that a person may grasp it? Mind Language 4(1–2):68–102.Crossref, Google Scholar
Jagarlamudi J, Daumé H III, Udupa R (2012) Incorporating lexical priors into topic models. Daelemans W, ed. Proc. 13th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 204–213.Google Scholar
Jurafsky D (2000) Speech & Language Processing (Pearson Education India, Noida, India).Google Scholar
Kuhn TS (2012) The Structure of Scientific Revolutions (University of Chicago Press, Chicago).Crossref, Google Scholar
Lau JH, Newman D, Baldwin T (2014) Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality. Wintner S, Goldwater S, Riezler S, eds. Proc. 14th Conf. Eur. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 530–539.Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444.Crossref, Google Scholar
Lee D, Hosanagar K (2019) How do recommender systems affect sales diversity? A cross-category investigation via randomized field experiment. Inform. Systems Res. 30(1):239–259.Link, Google Scholar
Lee D, Hosanagar K, Nair H (2018) Advertising content and consumer engagement on social media: Evidence from Facebook. Management Sci. 64(11):5105–5131.Link, Google Scholar
Lipton ZC (2018) The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16(3):31–57.Google Scholar
Liu X, Lee D, Srinivasan K (2019) Large-scale cross-category analysis of consumer review content on sales conversion leveraging deep learning. J. Marketing Res. 56(6):918–943.Crossref, Google Scholar
Liu Y, Liu Z, Chua TS, Sun M (2015) Topical word embeddings. Gunning D, Yeh PZ, eds. Proc. AAAI Conf. Artificial Intelligence, vol. 29(1) (AAAI, Palo Alto, CA).Google Scholar
Lu J, Lee D, Kim TW, Danks D (2019) Good explanation for algorithmic transparency. Preprint, submitted November 11, https://dx.doi.org/10.2139/ssrn.3503603.Google Scholar
Lundberg S, Lee SI (2017) A unified approach to interpreting model predictions. Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R, eds. Proc. 31st Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 4768–4777.Google Scholar
Margolis E, Laurence S, eds. (1999) Concepts: Core Readings (MIT Press, Cambridge, MA).Google Scholar
Margolis E, Laurence S (2023) Concepts. Zalta EN, Nodelman U, eds. The Stanford Encyclopedia of Philosophy, Fall 2023 ed. (Metaphysics Research Lab, Stanford University, Stanford, CA).Google Scholar
Mcauliffe JD, Blei DM (2008) Supervised topic models. Platt J, Koller D, Singer Y, Roweis S, eds. Proc. 20th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 121–128.Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 3111–3119.Google Scholar
Miller T (2018) Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267(2019):1–38.Google Scholar
Mimno D, McCallum A (2008) Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. Barzilay R, Johnson M, eds. Proc. 2011 Conf. Empirical Methods Natl. Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 262–272.Google Scholar
Miranda S, Berente N, Seidel S, Safadi H, Burton-Jones A (2022) Editor’s comments: Computationally intensive theory construction: A primer for authors and reviewers. MIS Quart. 46(2):iii–xviii.Crossref, Google Scholar
Moody CE (2016) Mixing Dirichlet topic models and word embeddings to make lda2vec. Preprint, submitted May 6, https://arxiv.org/abs/1605.02019.Google Scholar
Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Interpretable machine learning: Definitions, methods, and applications. Preprint, submitted January 14, https://arxiv.org/abs/1901.04592.Google Scholar
Murphy G (2004) The Big Book of Concepts (MIT Press, Cambridge, MA).Google Scholar
Netzer O, Lemaire A, Herzenstein M (2019) When words sweat: Identifying signals for loan default in the text of loan applications. J. Marketing Res. 56(6):960–980.Crossref, Google Scholar
Netzer O, Feldman R, Goldenberg J, Fresko M (2012) Mine your own business: Market-structure surveillance through text mining. Marketing Sci. 31(3):521–543.Link, Google Scholar
Newman D, Lau JH, Grieser K, Baldwin T (2010) Automatic evaluation of topic coherence. Kaplan R, Burstein J, Harper M, PennHuman G, eds. Language Tech. 2010 Annual Conf. North American Chapter Assoc. Comput. Linguist. (Association for Computational Linguistics, Stroudsburg, PA), 100–108.Google Scholar
Osherson DN, Smith EE (1981) On the adequacy of prototype theory as a theory of concepts. Cognition 9(1):35–58.Crossref, Google Scholar
Pariser E (2011) The Filter Bubble: How the New Personalized Web Is Changing What We Read and How We Think (Penguin, New York).Google Scholar
Pennington J, Socher R, Manning CD (2014) Glove: Global vectors for word representation. Moschitti A, Pang B, Daelemans W, eds. Proc. 2014 Conf. Empirical Methods Natural Language Processing (EMNLP) (Association for Computational Linguistics, Stroudsburg, PA), 1532–1543.Google Scholar
Pham CM, Hoyle A, Sun S, Iyyer M (2023) TopicGPT: A prompt-based topic modeling framework. Preprint, submitted November 2, https://arxiv.org/abs/2311.01449.Google Scholar
Ransbotham S, Lurie NH, Liu H (2019) Creation and consumption of mobile word of mouth: How are mobile reviews different? Marketing Sci. 38(5):773–792.Link, Google Scholar
Ras G, van Gerven M, Haselager P (2018) Explanation methods in deep learning: Users, values, concerns and challenges. Escalante H, Escalera S, Guyon I, Baró X, Güçlütürk Y, Güçlü U, van Gerven M, eds. Explainable and Interpretable Models in Computer Vision and Machine Learning, Springer Series on Challenges in Machine Learning (Springer, Cham, Switzerland), 19–36.Crossref, Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?”: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York).Google Scholar
Roberts ME, Stewart BM, Tingley D, Lucas C, Leder-Luis J, Gadarian SK, Albertson B, Rand DG (2014) Structural topic models for open-ended survey responses. Amer. J. Political Sci. 58(4):1064–1082.Crossref, Google Scholar
Rosch E (2002) Principles of categorization. Levitin DJ, ed. Foundations of Cognitive Psychology: Core Readings (MIT Press, Cambridge, MA), 251–270.Google Scholar
Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5):206–215.Crossref, Google Scholar
Schölkopf B, Locatello F, Bauer S, Ke NR, Kalchbrenner N, Goyal A, Bengio Y (2021) Toward causal representation learning. Proc. IEEE 109(5):612–634.Crossref, Google Scholar
Shi B, Lam W, Jameel S, Schockaert S, Lai KP (2017) Jointly learning word embeddings and latent topics. Proc. 40th Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (Association for Computing Machinery, New York), 375–384.Google Scholar
Sloutsky VM, Deng W (2019) Categories, concepts, and conceptual development. Lang. Cogn. Neurosci. 34(10):1284–1297.Crossref, Google Scholar
Solomon KO, Medin DL, Lynch EB (1999) Concepts do more than categorize. Trends Cogn. Sci. 3(3):99–105.Crossref, Google Scholar
Sridhar D, Daumé H III, Blei D (2022) Heterogeneous supervised topic models. Trans. Assoc. Comput. Linguist. 10:732–745.Crossref, Google Scholar
Srivastava A, Sutton C (2017) Autoencoding variational inference for topic models. Preprint, submitted March 4, https://arxiv.org/abs/1703.01488.Google Scholar
Sunstein CR (2018) Republic: Divided Democracy in the Age of Social Media (Princeton University Press, Princeton, NJ).Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B Statist. Methodology 58(1):267–288.Crossref, Google Scholar
Timoshenko A, Hauser JR (2019) Identifying customer needs from user-generated content. Marketing Sci. 38(1):1–20.Link, Google Scholar
Toubia O, Iyengar G, Bunnell R, Lemaire A (2019) Extracting features of entertainment products: A guided LDA approach informed by the psychology of media consumption. J. Marketing Res. 56(1):18–36.Crossref, Google Scholar
Vayansky I, Kumar SA (2020) A review of topic modeling methods. Inform. Systems 94(2020):101582.Crossref, Google Scholar
Wang X, Yang Y (2020) Neural topic model with attention for supervised learning. Chiappa S, Calandra R, eds. Proc. Twenty Third Internat. Conf. Artificial Intelligence Statist. (PMLR, New York), 1147–1156.Google Scholar
Wang H, Prakash N, Hoang NK, Hee MS, Naseem U, Lee RKW (2023) Prompting large language models for topic modeling. 2023 IEEE Internat. Conf. Big Data (BigData) (IEEE, Piscataway, NJ), 1236–1241.Google Scholar
Wernicke S (2015) How to use data to make a hit TV show. Accessed April 17, 2024, https://www.ted.com/talks/sebastian_wernicke_how_to_use_data_to_make_a_hit_tv_show.Google Scholar
Xu W, Hu W, Wu F, Sengamedu S (2023) Detime: Diffusion-enhanced topic modeling using encoder-decoder based LLM. Preprint, submitted October 23, https://arxiv.org/abs/2310.15296.Google Scholar
Xun G, Li Y, Gao J, Zhang A (2017) Collaboratively improving topic discovery and word embeddings by coordinating global and local contexts. Proc. 23rd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 535–543.Google Scholar
Yang Y, Zhang K, Fan Y (2023) SDTM: A supervised Bayesian deep topic model for text analytics. Inform. Systems Res. 34(1):137–156.Link, Google Scholar
Zhang K, Moe W (2021) Measuring brand favorability using large-scale social media data. Inform. Systems Res. 32(4):1128–1139.Link, Google Scholar
Zhu J, Ahmed A, Xing EP (2012) MedLDA: Maximum margin supervised topic models. J. Machine Learn. Res. 13(August):2237–2278.Google Scholar

cover image Information Systems Research

Volume 36, Issue 1

March 2025

Pages iii-xi, 1-646, C2

Article Information

Supplemental Material

Metrics

Information

Received:September 24, 2020
Accepted:March 16, 2024
Published Online:May 10, 2024

Cite as

Dokyun “DK” Lee; , Zhaoqi “ZQ” Cheng; , Chengfeng Mao; , Emaad Manzoor (2024) Guided Diverse Concept Miner (GDCM): Uncovering Relevant Constructs for Managerial Insights from Text. Information Systems Research 36(1):370-393.

https://doi.org/10.1287/isre.2020.0494

Keywords

Acknowledgments

The authors thank Eric Zhou, David Blei, Olivier Toubia, Oded Netzer, Jey Han Lau, Arun Rai, Gedas Adomavicius, Sudhir K., Carl Mela, Christophe Van Den Bulte, Raghu Iyengar, Eric Bradlow, Ryan Dew, Alex Burnap, Mingfeng Lin, Panos Ipeirotis, D. J. Wu, Kunpeng Zhang, Daehwan Ahn, Alan Montgomery, Lan Luo, Dinesh Puranam, George Chen, Lizhen Xu, John McCoy, Eric Schwartz, Fred Feinberg, Anocha Aribarg, and Puneet Manchanda for very helpful comments or conversations that shaped the paper. The authors also thank participants in the Marketing Science Conferences 2018 and 2019; Conference on Information Systems and Technology 2018; the Conference on Digital Marketing and Machine Learning 2018; the Advanced Computing in Social Sciences Symposium 2019; Choice Symposium 2019; INFORMS 2019; the Conference on Artificial Intelligence, Machine Learning, and Digital Analytics 2019; 2019 Korean Chapter of the Association for Information Systems (KrAIS) Research Workshop at International Conference of Information Systems (ICIS); and Wharton Behavioral Insights Through Text 2020 as well as seminar audiences at McGill University, Korea Advanced Institute of Science and Technology, Seoul National University, the University of Pittsburgh, the University of Southern California, the University of Minnesota, HEC Paris, the University of Maryland, Georgia Institute of Technology, Harvard University, the University of Michigan, The Wharton School, and Rutgers University for very helpful comments or conversations that shaped the paper.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Guided Diverse Concept Miner (GDCM): Uncovering Relevant Constructs for Managerial Insights from Text

References

Volume 36, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News