Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining
Published Online:26 Mar 2021https://doi.org/10.1287/isre.2020.0977
References
- (2012) Putting money where the mouths are: The relation between venture financing and electronic word-of-mouth. Inform. Systems Res. 23(3-part-2):976–992.Google Scholar
- (1973) Regression with a binary independent variable subject to errors of observation. J. Econometrics 1(1):49–59.Crossref, Google Scholar
- (2010) On the predictive ability of narrative disclosures in annual reports. Eur. J. Oper. Res. 202(3):789–801.Crossref, Google Scholar
- (1994) Evidence on the validity of cross-sectional and longitudinal labor market data. J. Labor Econom. 12(3):345–368.Crossref, Google Scholar
- (2010) Measurement Error: Models, Methods, and Applications (CRC Press, Boca Raton, FL). Crossref, Google Scholar
- (2006) Measurement Error in Nonlinear Models: A Modern Perspective (Chapman and Hall/CRC, Boca Raton, FL). Crossref, Google Scholar
- (2004) Data mining in metric space: An empirical analysis of supervised learning performance criteria. Proc. 10th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 69–78. Google Scholar
- (2014) Hiring biases in online labor markets: The case of gender stereotyping. Proc. 35th Internat. Conf. Inform. Systems (ICIS), Auckland, New Zealand.Google Scholar
- (2012) Business intelligence and analytics: from big data to big impact. Management Inform. Systems Quart. 36(4):1165.Crossref, Google Scholar
- (2016) Xgboost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 785–794.Google Scholar
- (1994) Simulation-extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc. 89(428):1314–1328.Crossref, Google Scholar
- (2009) Bias vs Variance Decomposition for Regression and Classification. Data Mining and Knowledge Discovery Handbook (Springer, New York).Google Scholar
- (2011) Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Trans. Knowledge Data Engrg. 23(10):1498–1512.Crossref, Google Scholar
- (2012) Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Sci. 31(3):493–520.Link, Google Scholar
- (2014) “Popularity effect” in user-generated content: Evidence from online product reviews. Inform. Systems Res. 25(2):222–238.Link, Google Scholar
- (2012) Econometric Analysis (Pearson, Boston). Google Scholar
- (2014) Research note: The allure of homophily in social media: Evidence from investor responses on virtual communities. Inform. Systems Res. 25(3):604–617.Link, Google Scholar
- (2001) Mismeasured variables in econometric analysis: Problems from the right and problems from the left. J. Econom. Perspective 15(4):57–67.Crossref, Google Scholar
- (1998) Misclassification of the dependent variable in a discrete-response setting. J. Econometrics 87(2):239–269.Crossref, Google Scholar
- (2014) Evidence on the information content of text in analyst reports. Accounting Rev. 89(6):2151–2180.Crossref, Google Scholar
- (2017) Does facial expression matter even online? An empirical analysis of facial expression of emotion and crowdfunding success. Proc. 38th Internat. Conf. Inform. Systems (ICIS), Seoul, South Korea.Google Scholar
- (2006) A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62(1):85–96.Crossref, Google Scholar
- (2016) A survey of the applications of text mining in financial domain. Knowledge Base. Systems 114:128–147.Crossref, Google Scholar
- (2010) Textual analysis of corporate disclosures: A survey of the literature. J. Accounting Literature 29:143.Google Scholar
- (2013) From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. Proc. 22nd Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 897–908.Google Scholar
- (2014) Doing business with strangers: Reputation in online service marketplaces. Inform. Systems Res. 25(4):865–886.Link, Google Scholar
- (2015) Assessing order effects in online community-based health forums. Proc. 36th Internat. Conf. Inform. Systems (ICIS), Fort Worth, TX.Google Scholar
- (1998) The case against accuracy estimation for comparing induction algorithms. Proc. 15th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 445–453.Google Scholar
- (2014) How to attract and retain readers in enterprise blogging? Inform. Systems Res. 25(1):35–52.Link, Google Scholar
- (2000) Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs. J. Amer. Statist. Assoc. 95(449):51–61.Crossref, Google Scholar
- (2013) The association between the disclosure and the realization of information security risk factors. Inform. Systems Res. 24(2):201–218.Link, Google Scholar
- (2016) Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann, Cambridge, MA). Google Scholar
- (2016) Wikipedia detox. figshare. Accessed February 23, 2017, http://doi.org/10.6084/m9.figshare.4054689.Google Scholar
- (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.Link, Google Scholar
- (2016) How much is an image worth? An empirical analysis of property’s image aesthetic quality on demand at AirBNB. Proc. 37th Internat. Conf. on Inform. Systems (ICIS, Dublin, Ireland).Google Scholar

