Textual Factors: A Scalable, Interpretable, and Data-Driven Approach to Analyzing Unstructured Information

Published Online:https://doi.org/10.1287/mnsc.2020.01180

References

  • Acikalin U, Caskurlu T, Hoberg G, Phillips GM (2023) Intellectual property protection lost and competition: An examination using large language models. Working paper, Tuck School of Business, Dartmouth College, Hanover, NH.Google Scholar
  • Andoni A, Indyk P, Laarhoven T, Razenshteyn I, Schmidt L (2015) Practical and optimal LSH for angular distance. Proc. 29th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1225–1233.Google Scholar
  • Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J. Finance 59(3):1259–1294.CrossrefGoogle Scholar
  • Baker SR, Bloom N, Davis SJ (2016) Measuring economic policy uncertainty. Quart. J. Econom. 131(4):1593–1636.CrossrefGoogle Scholar
  • Bellstam G, Bhagat S, Cookson JA (2021) A text-based analysis of corporate innovation. Management Sci. 67(7):4004–4031.LinkGoogle Scholar
  • Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J. Machine Learn. Res. 3(Feb):1137–1155.Google Scholar
  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J. Machine Learn. Res. 3(1):993–1022.Google Scholar
  • Bodnaruk A, Loughran T, McDonald B (2015) Using 10-k text to gauge financial constraints. J. Financial Quant. Anal. 50(4):623–646.CrossrefGoogle Scholar
  • Brown SV, Tucker JW (2011) Large-sample evidence on firms’ year-over-year MD&A modifications. J. Accounting Res. 49(2):309–346.CrossrefGoogle Scholar
  • Buehlmaier MM, Whited TM (2018) Are financial constraints priced? Evidence from textual analysis. Rev. Financial Stud. 31(7):2693–2728.CrossrefGoogle Scholar
  • Chen Y, Kelly BT, Xiu D (2024a) Expected returns and large language models. Working paper, Booth School of Business, University of Chicago, Chicago.Google Scholar
  • Chen MA, Wu Q, Yang B (2019) How valuable is fintech innovation? Rev. Financial Stud. 32(5):2062–2106.CrossrefGoogle Scholar
  • Chen J, Tang G, Zhou G, Zhu W (2024b) ChatGPT, stock market predictability and links to the macroeconomy. Working paper, John M. Olin Business School, Washington University in St. Louis, St. Louis.Google Scholar
  • Cherepanov V, Shi F, Zakolyukina A (2024) Fraud culture. Working paper, Booth School of Business, University of Chicago, Chicago.Google Scholar
  • Cohen L, Malloy C, Nguyen Q (2020) Lazy prices. J. Finance 75(3):1371–1415.CrossrefGoogle Scholar
  • Cong LW, Liang T, Yang B, Zhang X (2021) Chapter 10: Analyzing textual information at scale. Balachandran K, ed. Information for Efficient Decision Making: Big Data, Blockchain and Relevance (World Scientific Publishing, Singapore), 239–271.Google Scholar
  • Cong LW, Tang K, Wang J, Zhang Y (2020) AlphaPortfolio: Direct construction through reinforcement learning and interpretable AI. Preprint, submitted April 20, http://dx.doi.org/10.2139/ssrn.3554486.Google Scholar
  • Datar M, Immorlica N, Indyk P, Mirrokni VS (2004) Locality-sensitive hashing scheme based on p-stable distributions. Proc. 20th Annual Sympos. Comput. Geometry (Association for Computing Machinery, New York), 253–262.Google Scholar
  • Engelberg JE, Parsons CA (2011) The causal impact of media in financial markets. J. Finance 66(1):67–97.CrossrefGoogle Scholar
  • Evans JA, Aceves P (2016) Machine translation: Mining text for social theory. Annu. Rev. Sociol. 42:21–50.CrossrefGoogle Scholar
  • Gentzkow M, Shapiro JM (2010) What drives media slant? Evidence from US daily newspapers. Econometrica 78(1):35–71.CrossrefGoogle Scholar
  • Gentzkow M, Kelly B, Taddy M (2019) Text as data. J. Econom. Literature 57(3):535–574.CrossrefGoogle Scholar
  • Grimmer J, Stewart BM (2013) Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Anal. (Oxford) 21(3):267–297.CrossrefGoogle Scholar
  • Hanley KW, Hoberg G (2010) The information content of IPO prospectuses. Rev. Financial Stud. 23(7):2821–2864.CrossrefGoogle Scholar
  • Hanley KW, Hoberg G (2012) Litigation risk, strategic disclosure and the underpricing of initial public offerings. J. Financial Econom. 103(2):235–254.CrossrefGoogle Scholar
  • Hanley KW, Hoberg G (2019) Dynamic interpretation of emerging risks in the financial sector. Rev. Financial Stud. 32(12):4543–4603.CrossrefGoogle Scholar
  • Hassan TA, Hollander S, Van Lent L, Tahoun A (2019) Firm-level political risk: Measurement and effects. Quart. J. Econom. 134(4):2135–2202.CrossrefGoogle Scholar
  • Hoberg G, Maksimovic V (2015) Redefining financial constraints: A text-based analysis. Rev. Financial Stud. 28(5):1312–1352.CrossrefGoogle Scholar
  • Hoberg G, Manela A (2024) The natural language of finance. Working paper, Marshall School of Business, University of Southern California, Los Angeles.Google Scholar
  • Hoberg G, Phillips G (2010) Product market synergies and competition in mergers and acquisitions: A text-based analysis. Rev. Financial Stud. 23(10):3773–3811.CrossrefGoogle Scholar
  • Hoberg G, Phillips G (2016) Text-based network industries and endogenous product differentiation. J. Political Econom. 124(5):1423–1465.CrossrefGoogle Scholar
  • Hoberg G, Knoblock C, Phillips G, Pujara J, Qiu Z, Raschid L (2024) Using representation learning and web text to identify competitor networks. Working paper, Tuck School of Business, Dartmouth College, Hanover, NH.Google Scholar
  • Jegadeesh N, Wu D (2013) Word power: A new approach for content analysis. J. Financial Econom. 110(3):712–729.CrossrefGoogle Scholar
  • Kelly B, Manela A, Moreira A (2021a) Text selection. J. Bus. Econom. Statist. 39(4):859–879.CrossrefGoogle Scholar
  • Kelly B, Papanikolaou D, Seru A, Taddy M (2021b) Measuring technological innovation over the long run. Amer. Econom. Rev. Insights 3(3):303–320.CrossrefGoogle Scholar
  • Kogan L, Papanikolaou D, Schmidt LD, Seegmiller B (2023) Technology and labor displacement: Evidence from linking patents with worker-level data. Working paper, Sloan School of Management, Massachusetts Institute of Technology, Boston.Google Scholar
  • Leskovec J, Rajaraman A, Ullman JD (2020) Mining of Massive Data Sets (Cambridge University Press, Cambridge, UK).CrossrefGoogle Scholar
  • Li K, Mai F, Shen R, Yan X (2021) Measuring corporate culture using machine learning. Rev. Financial Stud. 34(7):3265–3315.CrossrefGoogle Scholar
  • Lopez-Lira A, Tang Y (2023) Can ChatGPT forecast stock price movements? return predictability and large language models. Preprint, submitted April 15, https://arxiv.org/abs/2304.07619.Google Scholar
  • Loughran T, McDonald B (2013) IPO first-day returns, offer price revisions, volatility, and form S-1 language. J. Financial Econom. 109(2):307–326.CrossrefGoogle Scholar
  • Manela A, Moreira A (2017) News implied volatility and disaster concerns. J. Financial Econom. 123(1):137–162.CrossrefGoogle Scholar
  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Proc. 27th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3111–3119.Google Scholar
  • Schwenkler G, Zheng H (2020a) Competition or contagion? evidence from cryptocurrency peers. Working paper, Leavey School of Business, Santa Clara University, Santa Clara, CA.Google Scholar
  • Schwenkler G, Zheng H (2020b) The network of firms implied by the news. Working paper, Leavey School of Business, Santa Clara University, Santa Clara, CA.Google Scholar
  • Schwenkler G, Zheng H (2024) Why does news coverage predict returns? evidence from the underlying editor preferences for risky stocks. Working paper, Leavey School of Business, Santa Clara University, Santa Clara, CA.Google Scholar
  • Sontag D, Roy DM (2011) Complexity of inference in latent Dirichlet allocation. Proc. 25th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1008–1016.Google Scholar
  • Streltsov A (2025) Generating exposures with large language models: Insights into M&A activity. Working paper, School of Management University at Buffalo The State University of New York, Buffalo, NY.Google Scholar
  • Tetlock PC (2007) Giving content to investor sentiment: The role of media in the stock market. J. Finance 62(3):1139–1168.CrossrefGoogle Scholar
  • Wallach HM, Mimno D, McCallum A (2009) Rethinking LDA: Why priors matter. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1973–1981. Google Scholar
  • Xu W, Kotecha MC, McAdams DA (2024) How good is ChatGPT? An exploratory study on ChatGPT’s performance in engineering design tasks and subjective decision-making. Proc. Design Soc., vol. 4 (Cambridge University Press, Cambridge, UK), 2307–2316.Google Scholar
  • Zhang Y, Li Y, Cui L, Cai D, Liu L, Fu T, Huang X, et al. (2023) Siren’s song in the AI ocean: A survey on hallucination in large language models. Preprint, submitted September 3, https://arxiv.org/abs/2309.01219.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.