Correcting Misclassification Bias in Regression Models with Variables Generated via Data Mining

Published Online:https://doi.org/10.1287/isre.2020.0977

References

  • Aggarwal R , Gopal R , Gupta A , Singh H (2012) Putting money where the mouths are: The relation between venture financing and electronic word-of-mouth. Inform. Systems Res. 23(3-part-2):976–992.Google Scholar
  • Aigner DJ (1973) Regression with a binary independent variable subject to errors of observation. J. Econometrics 1(1):49–59.CrossrefGoogle Scholar
  • Balakrishnan R , Qiu XY , Srinivasan P (2010) On the predictive ability of narrative disclosures in annual reports. Eur. J. Oper. Res. 202(3):789–801.CrossrefGoogle Scholar
  • Bound J , Brown C , Duncan GJ , Rodgers WL (1994) Evidence on the validity of cross-sectional and longitudinal labor market data. J. Labor Econom. 12(3):345–368.CrossrefGoogle Scholar
  • Buonaccorsi JP (2010) Measurement Error: Models, Methods, and Applications (CRC Press, Boca Raton, FL). CrossrefGoogle Scholar
  • Carroll RJ , Ruppert D , Crainiceanu CM , Stefanski LA (2006) Measurement Error in Nonlinear Models: A Modern Perspective (Chapman and Hall/CRC, Boca Raton, FL). CrossrefGoogle Scholar
  • Caruana R , Niculescu-Mizil A (2004) Data mining in metric space: An empirical analysis of supervised learning performance criteria. Proc. 10th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 69–78. Google Scholar
  • Chan J , Wang J (2014) Hiring biases in online labor markets: The case of gender stereotyping. Proc. 35th Internat. Conf. Inform. Systems (ICIS), Auckland, New Zealand.Google Scholar
  • Chen H , Chiang RHL , Storey VC (2012) Business intelligence and analytics: from big data to big impact. Management Inform. Systems Quart. 36(4):1165.CrossrefGoogle Scholar
  • Chen T , Guestrin C (2016) Xgboost: A scalable tree boosting system. Proc. 22nd ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 785–794.Google Scholar
  • Cook JR , Stefanski LA (1994) Simulation-extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc. 89(428):1314–1328.CrossrefGoogle Scholar
  • Geurts P (2009) Bias vs Variance Decomposition for Regression and Classification. Data Mining and Knowledge Discovery Handbook (Springer, New York).Google Scholar
  • Ghose A , Ipeirotis PG (2011) Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. IEEE Trans. Knowledge Data Engrg. 23(10):1498–1512.CrossrefGoogle Scholar
  • Ghose A , Ipeirotis PG , Li B (2012) Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Sci. 31(3):493–520.LinkGoogle Scholar
  • Goes PB , Lin M , Yeung CMA (2014) “Popularity effect” in user-generated content: Evidence from online product reviews. Inform. Systems Res. 25(2):222–238.LinkGoogle Scholar
  • Greene WH (2012) Econometric Analysis (Pearson, Boston). Google Scholar
  • Gu B , Konana P , Raghunathan R , Chen HWM (2014) Research note: The allure of homophily in social media: Evidence from investor responses on virtual communities. Inform. Systems Res. 25(3):604–617.LinkGoogle Scholar
  • Hausman JA (2001) Mismeasured variables in econometric analysis: Problems from the right and problems from the left. J. Econom. Perspective 15(4):57–67.CrossrefGoogle Scholar
  • Hausman JA , Abrevaya J , Scott-Morton FM (1998) Misclassification of the dependent variable in a discrete-response setting. J. Econometrics 87(2):239–269.CrossrefGoogle Scholar
  • Huang AH , Zang AY , Zheng R (2014) Evidence on the information content of text in analyst reports. Accounting Rev. 89(6):2151–2180.CrossrefGoogle Scholar
  • Kim J , Park J (2017) Does facial expression matter even online? An empirical analysis of facial expression of emotion and crowdfunding success. Proc. 38th Internat. Conf. Inform. Systems (ICIS), Seoul, South Korea.Google Scholar
  • Küchenhoff H , Mwalili SM , Lesaffre E (2006) A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62(1):85–96.CrossrefGoogle Scholar
  • Kumar BS , Ravi V (2016) A survey of the applications of text mining in financial domain. Knowledge Base. Systems 114:128–147.CrossrefGoogle Scholar
  • Li F (2010) Textual analysis of corporate disclosures: A survey of the literature. J. Accounting Literature 29:143.Google Scholar
  • McAuley JJ , Leskovec J (2013) From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. Proc. 22nd Internat. Conf. World Wide Web (Association for Computing Machinery, New York), 897–908.Google Scholar
  • Moreno A , Terwiesch C (2014) Doing business with strangers: Reputation in online service marketplaces. Inform. Systems Res. 25(4):865–886.LinkGoogle Scholar
  • Mousavi R , Raghu T , Frey K (2015) Assessing order effects in online community-based health forums. Proc. 36th Internat. Conf. Inform. Systems (ICIS), Fort Worth, TX.Google Scholar
  • Provost FJ , Fawcett T , Kohavi R (1998) The case against accuracy estimation for comparing induction algorithms. Proc. 15th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 445–453.Google Scholar
  • Singh PV , Sahoo N , Mukhopadhyay T (2014) How to attract and retain readers in enterprise blogging? Inform. Systems Res. 25(1):35–52.LinkGoogle Scholar
  • Spiegelman D , Rosner B , Logan R (2000) Estimation and inference for logistic regression with covariate misclassification and measurement error in main study/validation study designs. J. Amer. Statist. Assoc. 95(449):51–61.CrossrefGoogle Scholar
  • Wang T , Kannan KN , Ulmer JR (2013) The association between the disclosure and the realization of information security risk factors. Inform. Systems Res. 24(2):201–218.LinkGoogle Scholar
  • Witten IH , Frank E , Hall MA , Pal CJ (2016) Data Mining: Practical Machine Learning Tools and Techniques (Morgan Kaufmann, Cambridge, MA). Google Scholar
  • Wulczyn E , Thain N , Dixon L (2016) Wikipedia detox. figshare. Accessed February 23, 2017, http://doi.org/10.6084/m9.figshare.4054689.Google Scholar
  • Yang M , Adomavicius G , Burtch G , Ren Y (2018) Mind the gap: Accounting for measurement error and misclassification in variables generated via data mining. Inform. Systems Res. 29(1):4–24.LinkGoogle Scholar
  • Zhang S , Lee D , Singh PV , Srinivasan K (2016) How much is an image worth? An empirical analysis of property’s image aesthetic quality on demand at AirBNB. Proc. 37th Internat. Conf. on Inform. Systems (ICIS, Dublin, Ireland).Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.