Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach

Yi Yang
Yi Yang
[email protected]
https://orcid.org/0000-0001-8863-112X
Hong Kong University of Science and Technology, Kowloon, Hong Kong;
Search for more papers by this author
,
Kunpeng Zhang
Corresponding Author
Kunpeng Zhang
[email protected]
https://orcid.org/0000-0002-1474-3169
University of Maryland, College Park, Maryland 20742;
Search for more papers by this author
,
Yangyang Fan
Yangyang Fan
[email protected]
https://orcid.org/0000-0002-3930-1818
Hong Kong Polytechnic University, Kowloon, Hong Kong
Search for more papers by this author

Hong Kong University of Science and Technology, Kowloon, Hong Kong;

Search for more papers by this author

Kunpeng Zhang

Corresponding Author

Kunpeng Zhang

[email protected]

https://orcid.org/0000-0002-1474-3169

University of Maryland, College Park, Maryland 20742;

Search for more papers by this author

Yangyang Fan

[email protected]

https://orcid.org/0000-0002-3930-1818

Hong Kong Polytechnic University, Kowloon, Hong Kong

Search for more papers by this author

Published Online:15 Apr 2021https://doi.org/10.1287/ijoc.2020.1046

References

Adhikari A, Ram A, Tang R, Lin J (2019) Rethinking complex neural network architectures for document classification. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Tech., vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4046–4051.Google Scholar
Andrzejewski D, Zhu X, Craven M (2009) Incorporating domain knowledge into topic modeling via Dirichlet forest priors. Proc. 26th Annual Internat. Conf. Machine Learn. (Association for Computing Machinery, New York), 25–32.confprocArora S, Liang Y, Ma T (2017) A simple but tough-to-beat baseline for sentence embeddings. Proc. Internat. Conf. Learn. Representations.Google Scholar
Arora S, Li Y, Liang Y, Ma T, Risteski A (2016) A latent variable model approach to PMI-based word embeddings. Trans. Assoc. Comput. Linguist. 4:385–399.Crossref, Google Scholar
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. Proc. Internat. Conf. Learn. Representations.Google Scholar
Bao Y, Datta A (2014) Simultaneously discovering and quantifying risk types from textual risk disclosures. Management Sci. 60(6):1371–1391.Link, Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Machine Intelligence 35(8):1798–1828.Crossref, Google Scholar
Bernard D, Alexander K, Raman U (2007) Equilibrium portfolio strategies in the presence of sentiment risk and excess volatility. Working Paper No. 13401, National Bureau of Economic Research, Cambridge, MA.Google Scholar
Black F, Scholes M (1973) The pricing of options and corporate liabilities. J. Political Econom. 81(3):637–654.Crossref, Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J. Machine Learn. Res. 3(January):993–1022.Google Scholar
Bodnaruk A, Loughran T, McDonald B (2015) Using 10-K text to gauge financial constraints. J. Financial Quant. Anal. 50(4):623–646.Crossref, Google Scholar
Boudoukh J, Feldman R, Kogan S, Richardson M (2018) Information, trading, and volatility: Evidence from firm-specific news. Rev. Financial Stud. 32(3):992–1033.Crossref, Google Scholar
Büschken J, Allenby GM (2016) Sentence-based text analysis for customer reviews. Marketing Sci. 35(6):953–975.Link, Google Scholar
Christoffersen PF, Diebold FX (2000) How relevant is volatility forecasting for financial risk management? Rev. Econom. Statist. 82(1):12–22.Crossref, Google Scholar
Das SR, Chen MY (2007) Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Sci. 53(9):1375–1388.Link, Google Scholar
Devlin J, Chang MW, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Tech., vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 4171–4186.Google Scholar
Drucker H, Burges CJC, Kaufman L, Smola AJ, Vapnik V (1997) Support vector regression machines. Mozer MC, Jordan MI, Petsche T, eds. Proc. Ninth Internat. Conf. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 155–161.Google Scholar
Dyer T, Lang M, Stice-Lawrence L (2017) The evolution of 10-K textual disclosure: Evidence from latent Dirichlet allocation. J. Accounting Econom. 64(2–3):221–245.Crossref, Google Scholar
Faruqui M, Dodge J, Jauhar SK, Dyer C, Hovy E, Smith NA (2015) Retrofitting word vectors to semantic lexicons. proc. 2015 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Tech. (Association for Computational Linguistics, Stroudsburg, PA), 1606–1615.Google Scholar
Frankel R, Johnson M, Skinner DJ (1999) An empirical examination of conference calls as a voluntary disclosure medium. J. Accounting Res. 37(1):133–150.Crossref, Google Scholar
Gunning D (2017) Explainable artificial intelligence (XAI). Report DARPA/I20, Defense Advanced Research Projects Agency, Arlington, VA.Google Scholar
Hu M, Liu B (2004) Mining and summarizing customer reviews. Proc. 10th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 168–177.Google Scholar
Huang AH, Lehavy R, Zang AY, Zheng R (2017) Analyst information discovery and interpretation roles: A topic modeling approach. Management Sci. 64(6):2833–2855.Link, Google Scholar
Jain S, Wallace BC (2019) Attention is not explanation. Proc. 2019 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Technologies, vol. 1 (Association for Computational Linguistics, Stroudsburg, PA), 3543–3556.Google Scholar
Jegadeesh N, Wu D (2013) Word power: A new approach for content analysis. J. Financial Econom. 110(3):712–729.Crossref, Google Scholar
Kearney C, Liu S (2014) Textual sentiment in finance: A survey of methods and models. Internat. Rev. Financial Anal. 33(May):171–185.Crossref, Google Scholar
Kim Y (2014) Convolutional neural networks for sentence classification. Proc. 2014 Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1746–1751.Google Scholar
Kogan S, Levin D, Routledge BR, Sagi JS, Smith NA (2009) Predicting risk from financial reports with regression. Proc. Human Language Tech.: 2009 Annual Conf. North Amer. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 272–280.Google Scholar
Kothari SP, Li X, Short JE (2009) The effect of disclosures by management, analysts, and business press on cost of capital, return volatility, and analyst forecasts: A study using content analysis. Accounting Rev. 84(5):1639–1670.Crossref, Google Scholar
Larcker DF, Zakolyukina AA (2012) Detecting deceptive discussions in conference calls. J. Accounting Res. 50(2):495–540.Crossref, Google Scholar
Li F (2010) The information content of forward-looking statements in corporate filings—A naïve Bayesian machine learning approach. J. Accounting Res. 48(5):1049–1102.Crossref, Google Scholar
Li X, Chen K, Sun SX, Fung T, Wang H, Zeng DD (2016) A commonsense knowledge-enabled textual analysis approach for financial market surveillance. INFORMS J. Comput. 28(2):278–294.Link, Google Scholar
Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10‐Ks. J. Finance 66(1):35–65.Crossref, Google Scholar
Loughran T, McDonald B (2013) IPO first-day returns, offer price revisions, volatility, and form S-1 language. J. Financial Econom. 109(2):307–326.Crossref, Google Scholar
Loughran T, McDonald B (2016) Textual analysis in accounting and finance: A survey. J. Accounting Res. 54(4):1187–1230.Crossref, Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. Proc. 26th Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 3111–3119.Google Scholar
Miller GA (1995) WordNet: A lexical database for English. Comm. ACM 38(11):39–41.Crossref, Google Scholar
Pardoe D, Stone P, Saar-Tsechansky M, Keskin T, Tomak K (2010) Adaptive auction mechanism design and the incorporation of prior knowledge. INFORMS J. Comput. 22(3):353–370.Link, Google Scholar
Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic Inquiry and Word Count: LIWC2001 (Lawrence Erlbaum Publishers, Mahwah, NJ).Google Scholar
Pennington J, Socher R, Manning C (2014) GloVe: Global vectors for word representation. Proc. 2014 Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1532–1543.Google Scholar
Poon SH, Granger C (2005) Practical issues in forecasting volatility. Financial Anal. J. 61(1):45–56.Crossref, Google Scholar
Poon SH, Granger CW (2003) Forecasting volatility in financial markets: A review. J. Econom. Lit. 41(2):478–539.Crossref, Google Scholar
Qin Y, Yang Y (2019) What you say and how you say it matters: Predicting stock volatility using verbal and vocal cues. Proc. 57th Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 390–401.Google Scholar
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) Language models are unsupervised multitask learners.Google Scholar
Rekabsaz N, Lupu M, Baklanov A, Hanbury A, Duer A, Anderson L (2017) Volatility prediction using financial disclosures sentiments with word embedding-based ir models. Proc. 55th Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 1712–1721.Google Scholar
Schölkopf B, Simard P, Smola AJ, Vapnik V (1998) Prior knowledge in support vector kernels. Proc. 1997 Conf. Advances Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 640–646.Google Scholar
Sharman R, Kishore R, Ramesh R (2004) Computational ontologies and information systems II: Formal specification. Comm. Assoc. Inform. Systems 14(1):9.Google Scholar
Spasic I, Ananiadou S, McNaught J, Kumar A (2005) Text mining and ontologies in biomedicine: Making sense of raw text. Briefings Bioinform. 6(3):239–251.Crossref, Google Scholar
Sridharan SA (2015) Volatility forecasting using financial statement information. Accounting Rev. 90(5):2079–2106.Crossref, Google Scholar
Tetlock PC (2007) Giving content to investor sentiment: The role of media in the stock market. J. Finance 62(3):1139–1168.Crossref, Google Scholar
Tsai MF, Wang CJ (2014) Financial keyword expansion via continuous word vector representations. Proc. 2014 Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1453–1458.Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. Proc. 31st Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 5998–6008.Google Scholar
Wagstaff K, Cardie C, Rogers S, Schrödl S (2001) Constrained K-means clustering with background knowledge. Proc. 18th Internat. Conf. Machine Learn., vol. 1 (Morgan Kaufmann, San Francisco), 577–584.Google Scholar
Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. Proc. 2016 Conf. North Amer. Chapter Assoc. Comput. Linguistics: Human Language Tech. (Association for Computational Linguistics, Stroudsburg, PA), 1480–1489.Google Scholar
You H, Zhang Xj (2009) Financial reporting complexity and investor underreaction to 10-K information. Rev. Accounting Stud. 14(4):559–586.Crossref, Google Scholar
Zhang D, Zhou ZH, Chen S (2007) Semi-supervised dimensionality reduction. Proc. SIAM Conf. Data Mining (Society of Industrial and Applied Mathematics, Philadelphia), 629–634.Google Scholar

cover image INFORMS Journal on Computing

Volume 34, Issue 1

January-February 2022

Pages 1-669, C2

Article Information

Supplemental Material

Metrics

Information

Received:July 03, 2019
Accepted:October 27, 2020
Published Online:April 15, 2021

Cite as

Yi Yang, Kunpeng Zhang, Yangyang Fan (2021) Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach. INFORMS Journal on Computing 34(1):522-540.

https://doi.org/10.1287/ijoc.2020.1046

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Analyzing Firm Reports for Volatility Prediction: A Knowledge-Driven Text-Embedding Approach

References

Volume 34, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News