How Much Can Machines Learn Finance from Chinese Text Data?

Published Online:https://doi.org/10.1287/mnsc.2022.01468

References

  • Ahn SC, Horenstein AR (2013) Eigenvalue ratio test for the number of factors. Econometrica 81(3):1203–1227.CrossrefGoogle Scholar
  • Antweiler W, Frank MZ (2004) Is all that talk just noise? The information content of internet stock message boards. J. Finance 59(3):1259–1294.CrossrefGoogle Scholar
  • Arkhangelsky D, Athey S, Hirshberg DA, Imbens GW, Wager S (2021) Synthetic difference-in-differences. Amer. Econom. Rev. 111(12):4088–4118.CrossrefGoogle Scholar
  • Bai Z, Ding X (2012) Estimation of spiked eigenvalues in spiked models. Random Matrices Theory Appl. 1(02):1–21.CrossrefGoogle Scholar
  • Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70(1):191–221.CrossrefGoogle Scholar
  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J. Machine Learn. Res. 3(January):993–1022.Google Scholar
  • Calomiris CW, Mamaysky H (2019) How news and its context drive risk and returns around the world. J. Financial Econom. 133(2):299–336.CrossrefGoogle Scholar
  • Carhart MM (1997) On persistence in mutual fund performance. J. Finance 52(1):57–82.CrossrefGoogle Scholar
  • Chen Y (2015) Convolutional neural network for sentence classification. UWSpace (August 26), https://uwspace.uwaterloo.ca/handle/10012/9592.Google Scholar
  • Chen J, Jiang F, Tu J (2015) Asset allocation in the Chinese stock market: The role of return predictability. J. Portfolio Management 41(5):71–83.CrossrefGoogle Scholar
  • Chen T, Gao Z, He J, Jiang W, Xiong W (2019) Daily price limits and destructive market behavior. J. Econometrics 208(1):249–264.CrossrefGoogle Scholar
  • Cong LW, Liang T, Zhang X (2019) Textual factors: A scalable, interpretable, and data-driven approach to analyzing unstructured information. Preprint, submitted September 1, https://dx.doi.org/10.2139/ssrn.3307057.Google Scholar
  • Cowles A (1933) Can stock market forecasters forecast? Econometrica 1(3):309–324.CrossrefGoogle Scholar
  • Da Z, Engelberg J, Gao P (2015) The sum of all FEARS investor sentiment and asset prices. Rev. Financial Stud. 28(1):1–32.CrossrefGoogle Scholar
  • Deng K, Bol PK, Li KJ, Liu JS (2016) On the unsupervised analysis of domain-specific Chinese texts. Proc. Natl. Acad. Sci. USA 113(22):6154–6159.CrossrefGoogle Scholar
  • Devlin J, Chang M-W, Lee K, Toutanova K (2019) BERT: Pre-training of deep bidirectional transformers for language understanding. Preprint, submitted May 24, https://arxiv.org/abs/1810.04805.Google Scholar
  • Du Z, Huang AG, Wermers R, Wu W (2022) Language and domain specificity: A Chinese financial sentiment dictionary. Rev. Finance 26(3):673–719.CrossrefGoogle Scholar
  • Fama EF, French KR (1993) Common risk factors in the returns on stocks and bonds. J. Financial Econom. 33(1):3–56.CrossrefGoogle Scholar
  • Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J. Roy. Statist. Soc. Ser. B Statist. Methodology 70(5):849–911.CrossrefGoogle Scholar
  • Fan J, Guo J, Zheng S (2020a) Estimating number of factors by adjusted eigenvalues thresholding. J. Amer. Statist. Assoc. 117(538):852–861.CrossrefGoogle Scholar
  • Fan J, Ke Y, Wang K (2020b) Factor-adjusted regularized model selection. J. Econometrics 216(1):71–85.CrossrefGoogle Scholar
  • Fan J, Li R, Zhang C-H, Zou H (2020c) Statistical Foundations of Data Science (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Gao Z, Ren H, Zhang B (2020) Googling investor sentiment around the world. J. Financial Quant. Anal. 55(2):549–580.CrossrefGoogle Scholar
  • García D (2013) Sentiment during recessions. J. Finance 68(3):1267–1300.CrossrefGoogle Scholar
  • Gentzkow M, Kelly B, Taddy M (2019a) Text as data. J. Econom. Literature 57(3):535–574.CrossrefGoogle Scholar
  • Gentzkow M, Shapiro JM, Taddy M (2019b) Measuring group differences in high-dimensional choices: Method and application to congressional speech. Econometrica 87(4):1307–1340.CrossrefGoogle Scholar
  • Glasserman P, Mamaysky H (2019) Does unusual news forecast market stress? J. Financial Quant. Anal. 54(5):1937–1974.CrossrefGoogle Scholar
  • Gu S, Kelly B, Xiu D (2020) Empirical asset pricing via machine learning. Rev. Financial Stud. 33(5):2223–2273.CrossrefGoogle Scholar
  • Henry E (1973) Are investors influenced by how earnings press releases are written? J. Bus. Comm. 45(4):363–407.CrossrefGoogle Scholar
  • Horel E, Giesecke K (2020) Significance tests for neural networks. J. Machine Learn. Res. 21(227):1–29.Google Scholar
  • Jegadeesh N, Wu D (2013) Word power: A new approach for content analysis. J. Financial Econom. 110(3):712–729.CrossrefGoogle Scholar
  • Ke ZT, Kelly BT, Xiu D (2019) Predicting returns with text data. NBER Working Paper No. 26186, National Bureau of Economic Research, Cambridge, MA.Google Scholar
  • Larsen V, Thorsrud LA (2017) Asset returns, news topics, and media effects. Preprint, submitted September 19, https://dx.doi.org/10.2139/ssrn.3057950.Google Scholar
  • Loughran T, McDonald B (2011) When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. J. Finance 66(1):35–65.CrossrefGoogle Scholar
  • Loughran T, McDonald B (2016) Textual analysis in accounting and finance: A survey. J. Accounting Res. 54(4):1187–1230.CrossrefGoogle Scholar
  • Manela A, Moreira A (2017) News implied volatility and disaster concerns. J. Financial Econom. 123(1):137–162.CrossrefGoogle Scholar
  • Nagel S (2005) Short sales, institutional investors and the cross-section of stock returns. J. Financial Econom. 78(2):277–309.CrossrefGoogle Scholar
  • Nagel S (2021) Machine Learning in Asset Pricing (Princeton University Press, Princeton, NJ).Google Scholar
  • Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans. Signal Processing 45(11):2673–2681.CrossrefGoogle Scholar
  • Stock JH, Watson MW (2002) Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 97(460):1167–1179.CrossrefGoogle Scholar
  • Sun J (2017) Jieba Version v0.39 (August 31). https://github.com/fxsjy/jieba.Google Scholar
  • Sun L, Najand M, Shen J (2016) Stock return predictability and investor sentiment: A high-frequency perspective. J. Banking Finance 73(11):147–164.CrossrefGoogle Scholar
  • Taddy M (2013) Multinomial inverse regression for text analysis. J. Amer. Statist. Assoc. 108(503):755–770.CrossrefGoogle Scholar
  • Tetlock PC (2007) Giving content to investor sentiment: The role of media in the stock market. J. Finance 62(3):1139–1168.CrossrefGoogle Scholar
  • Tetlock PC, Saar-Tsechansky M, Macskassy S (2008) More than words: Quantifying language to measure firms’ fundamentals. J. Finance 63(3):1437–1467.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.