Guided Diverse Concept Miner (GDCM): Uncovering Relevant Constructs for Managerial Insights from Text
Abstract
Guided Diverse Concept Miner (GDCM) is an interpretable deep learning algorithm to (1) automatically extract corpus-level concepts from text data, (2) focus the discovery of concepts to filter through only the concepts highly correlated to the user-specified managerial outcome, and (3) quantify the concept’s correlational importance to the outcome. GDCM is used to explore and potentially extract previously unknown concepts and insights from the text that may explain the managerial outcome, without the need to provide any human-predefined guidance or labeled data on concepts. GDCM embeds words, documents, and concepts all in the same vector space, enabling easy interpretation of discovered concepts by associating words local to the concept vector. GDCM is explicitly configured to increase recovered-concept diversity, coherence, and relevance to managerial outcomes. We demonstrate GDCM as a “guided exploratory” tool for a hypothetical managerial case involving online purchase journey data connected to consumed reviews. GDCM scalably extracts concepts hidden in customer reviews highly correlated to conversion and provides concept importance in comparison with product ratings. Concepts produced turn out to be product qualities previously theorized to impact conversion in the literature, and correlational importance gauged by GDCM closely matches estimates from a previous causal study run on a similar data set, serving as external validations of GDCM as a “guided exploratory” tool. Additional experiments with other data show that extracted insights are sensitive to guiding managerial variables and sensibly so, further demonstrating the flexibility of GDCM as a managerial tool.
History: Ahmed Abbasi, Senior Editor; Huimin Zhao, Associate Editor.
Funding: The authors acknowledge funding from the Marketing Science Institute [Grant 4000562] and Nvidia Academic Hardware Grant Program for research.
Supplemental Material: The online appendix is available at https://doi.org/10.1287/isre.2020.0494.