Textual Factors: A Scalable, Interpretable, and Data-Driven Approach to Analyzing Unstructured Information
References
- (2023) Intellectual property protection lost and competition: An examination using large language models. Working paper, Tuck School of Business, Dartmouth College, Hanover, NH.Google Scholar
- (2015) Practical and optimal LSH for angular distance. Proc. 29th Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1225–1233.Google Scholar
- (2004) Is all that talk just noise? The information content of internet stock message boards. J. Finance 59(3):1259–1294.Crossref, Google Scholar
- (2016) Measuring economic policy uncertainty. Quart. J. Econom. 131(4):1593–1636.Crossref, Google Scholar
- (2021) A text-based analysis of corporate innovation. Management Sci. 67(7):4004–4031.Link, Google Scholar
- (2003) A neural probabilistic language model. J. Machine Learn. Res. 3(Feb):1137–1155.Google Scholar
- (2003) Latent Dirichlet allocation. J. Machine Learn. Res. 3(1):993–1022.Google Scholar
- (2015) Using 10-k text to gauge financial constraints. J. Financial Quant. Anal. 50(4):623–646.Crossref, Google Scholar
- (2011) Large-sample evidence on firms’ year-over-year MD&A modifications. J. Accounting Res. 49(2):309–346.Crossref, Google Scholar
- (2018) Are financial constraints priced? Evidence from textual analysis. Rev. Financial Stud. 31(7):2693–2728.Crossref, Google Scholar
- (2024a) Expected returns and large language models. Working paper, Booth School of Business, University of Chicago, Chicago.Google Scholar
- (2019) How valuable is fintech innovation? Rev. Financial Stud. 32(5):2062–2106.Crossref, Google Scholar
- (2024b) ChatGPT, stock market predictability and links to the macroeconomy. Working paper, John M. Olin Business School, Washington University in St. Louis, St. Louis.Google Scholar
- (2024) Fraud culture. Working paper, Booth School of Business, University of Chicago, Chicago.Google Scholar
- (2020) Lazy prices. J. Finance 75(3):1371–1415.Crossref, Google Scholar
- (2021) Chapter 10: Analyzing textual information at scale. Balachandran K, ed. Information for Efficient Decision Making: Big Data, Blockchain and Relevance (World Scientific Publishing, Singapore), 239–271.Google Scholar
- (2020) AlphaPortfolio: Direct construction through reinforcement learning and interpretable AI. Preprint, submitted April 20, http://dx.doi.org/10.2139/ssrn.3554486.Google Scholar
- (2004) Locality-sensitive hashing scheme based on p-stable distributions. Proc. 20th Annual Sympos. Comput. Geometry (Association for Computing Machinery, New York), 253–262.Google Scholar
- (2011) The causal impact of media in financial markets. J. Finance 66(1):67–97.Crossref, Google Scholar
- (2016) Machine translation: Mining text for social theory. Annu. Rev. Sociol. 42:21–50.Crossref, Google Scholar
- (2010) What drives media slant? Evidence from US daily newspapers. Econometrica 78(1):35–71.Crossref, Google Scholar
- (2019) Text as data. J. Econom. Literature 57(3):535–574.Crossref, Google Scholar
- (2013) Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political Anal. (Oxford) 21(3):267–297.Crossref, Google Scholar
- (2010) The information content of IPO prospectuses. Rev. Financial Stud. 23(7):2821–2864.Crossref, Google Scholar
- (2012) Litigation risk, strategic disclosure and the underpricing of initial public offerings. J. Financial Econom. 103(2):235–254.Crossref, Google Scholar
- (2019) Dynamic interpretation of emerging risks in the financial sector. Rev. Financial Stud. 32(12):4543–4603.Crossref, Google Scholar
- (2019) Firm-level political risk: Measurement and effects. Quart. J. Econom. 134(4):2135–2202.Crossref, Google Scholar
- (2015) Redefining financial constraints: A text-based analysis. Rev. Financial Stud. 28(5):1312–1352.Crossref, Google Scholar
- (2024) The natural language of finance. Working paper, Marshall School of Business, University of Southern California, Los Angeles.Google Scholar
- (2010) Product market synergies and competition in mergers and acquisitions: A text-based analysis. Rev. Financial Stud. 23(10):3773–3811.Crossref, Google Scholar
- (2016) Text-based network industries and endogenous product differentiation. J. Political Econom. 124(5):1423–1465.Crossref, Google Scholar
- (2024) Using representation learning and web text to identify competitor networks. Working paper, Tuck School of Business, Dartmouth College, Hanover, NH.Google Scholar
- (2013) Word power: A new approach for content analysis. J. Financial Econom. 110(3):712–729.Crossref, Google Scholar
- (2021a) Text selection. J. Bus. Econom. Statist. 39(4):859–879.Crossref, Google Scholar
- (2021b) Measuring technological innovation over the long run. Amer. Econom. Rev. Insights 3(3):303–320.Crossref, Google Scholar
- (2023) Technology and labor displacement: Evidence from linking patents with worker-level data. Working paper, Sloan School of Management, Massachusetts Institute of Technology, Boston.Google Scholar
- (2020) Mining of Massive Data Sets (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar
- (2021) Measuring corporate culture using machine learning. Rev. Financial Stud. 34(7):3265–3315.Crossref, Google Scholar
- (2023) Can ChatGPT forecast stock price movements? return predictability and large language models. Preprint, submitted April 15, https://arxiv.org/abs/2304.07619.Google Scholar
- (2013) IPO first-day returns, offer price revisions, volatility, and form S-1 language. J. Financial Econom. 109(2):307–326.Crossref, Google Scholar
- (2017) News implied volatility and disaster concerns. J. Financial Econom. 123(1):137–162.Crossref, Google Scholar
- Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Proc. 27th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 3111–3119.Google Scholar
- (2020a) Competition or contagion? evidence from cryptocurrency peers. Working paper, Leavey School of Business, Santa Clara University, Santa Clara, CA.Google Scholar
- (2020b) The network of firms implied by the news. Working paper, Leavey School of Business, Santa Clara University, Santa Clara, CA.Google Scholar
- (2024) Why does news coverage predict returns? evidence from the underlying editor preferences for risky stocks. Working paper, Leavey School of Business, Santa Clara University, Santa Clara, CA.Google Scholar
- Sontag D, Roy DM (2011) Complexity of inference in latent Dirichlet allocation. Proc. 25th Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1008–1016.Google Scholar
- (2025) Generating exposures with large language models: Insights into M&A activity. Working paper, School of Management University at Buffalo The State University of New York, Buffalo, NY.Google Scholar
- (2007) Giving content to investor sentiment: The role of media in the stock market. J. Finance 62(3):1139–1168.Crossref, Google Scholar
- Wallach HM, Mimno D, McCallum A (2009) Rethinking LDA: Why priors matter. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates Inc., Red Hook, NY), 1973–1981. Google Scholar
- (2024) How good is ChatGPT? An exploratory study on ChatGPT’s performance in engineering design tasks and subjective decision-making. Proc. Design Soc., vol. 4 (Cambridge University Press, Cambridge, UK), 2307–2316.Google Scholar
- (2023) Siren’s song in the AI ocean: A survey on hallucination in large language models. Preprint, submitted September 3, https://arxiv.org/abs/2309.01219.Google Scholar

