Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining

Published Online:https://doi.org/10.1287/isre.2017.0727

References

  • Agarwal R, Dhar V (2014) Editorial—Big data, data science, and analytics: The opportunity and challenge for IS research. Inform. Systems Res. 25(3):443–448.LinkGoogle Scholar
  • Aggarwal CC (2015) Data Mining: The Textbook (Springer, Cham, Switzerland).CrossrefGoogle Scholar
  • Aggarwal R, Gopal R, Gupta A, Singh H (2012) Putting money where the mouths are: The relation between venture financing and electronic word-of-mouth. Inform. Systems Res. 23(3-part-2):976–992.LinkGoogle Scholar
  • Agrawal A, Catalini C, Goldfarb A (2014) Some simple economics of crowdfunding. Lerner J, Stern S, eds. Innovation Policy and the Economy, 1st ed., Vol. 14 (University of Chicago Press, Chicago), 63–97.CrossrefGoogle Scholar
  • Archak N, Ghose A, Ipeirotis PG (2011) Deriving the pricing power of product features by mining consumer reviews. Management Sci. 57(8):1485–1509.LinkGoogle Scholar
  • Bao Y, Datta A (2014) Simultaneously discovering and quantifying risk types from textual risk disclosures. Management Sci. 60(6):1371–1391.LinkGoogle Scholar
  • Buonaccorsi JP, Laake P, Veierød MB (2005) On the effect of misclassification on bias of perfectly measured covariates in regression. Biometrics 61(3):831–836.CrossrefGoogle Scholar
  • Burtch G, Ghose A, Wattal S (2013) An empirical examination of the antecedents and consequences of contribution patterns in crowd-funded markets. Inform. Systems Res. 24(3):499–519.LinkGoogle Scholar
  • Burtch G, Ghose A, Wattal S (2015) The hidden cost of accommodating crowdfunder privacy preferences: A randomized field experiment. Management Sci. 61(5):949–962.LinkGoogle Scholar
  • Carroll RJ, Küchenhoff H, Lombard F, Stefanski LA (1996) Asymptotics for the SIMEX estimator in nonlinear measurement error models. J. Amer. Statist. Assoc. 91(433):242–250.CrossrefGoogle Scholar
  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement Error in Nonlinear Models: A Modern Perspective (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Chan J, Wang J (2014) Hiring biases in online labor markets: The case of gender stereotyping. Proc. 35th Internat. Conf. Inform. Systems (ICIS), Auckland, NZ, 1161–1178.Google Scholar
  • Chen H, Chiang RH, Storey VC (2012) Business intelligence and analytics: From big data to big impact. MIS Quart. 36(4):1165–1188.CrossrefGoogle Scholar
  • Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc. 89(428):1314–1328.CrossrefGoogle Scholar
  • Das SR, Chen MY (2007) Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Sci. 53(9):1375–1388.LinkGoogle Scholar
  • Dellarocas C (2003) The digitization of word of mouth: Promise and challenges of online feedback. Management Sci. 49(10):1407–1424.LinkGoogle Scholar
  • Fisher IE, Garnsey MR, Hughes ME (2016) Natural language processing in accounting, auditing and finance: A synthesis of the literature with a roadmap for future research. Intelligent Systems Accounting, Finance Management 23(3):157–214.CrossrefGoogle Scholar
  • Forman C, Ghose A, Wiesenfeld B (2008) Examining the relationship between reviews and sales: The role of reviewer identity disclosure in electronic markets. Inform. Systems Res. 19(3):291–313.LinkGoogle Scholar
  • Ghose A, Ipeirotis PG (2011) Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. Knowledge Data Engrg., IEEE Trans. 23(10):1498–1512.CrossrefGoogle Scholar
  • Ghose A, Ipeirotis PG, Li B (2012) Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Sci. 31(3):493–520.LinkGoogle Scholar
  • Gleser LJ (1990) Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. Contemporary Math. 112:99–114.CrossrefGoogle Scholar
  • Godes D, Mayzlin D (2004) Using online conversations to study word-of-mouth communication. Marketing Sci. 23(4):545–560.LinkGoogle Scholar
  • Goh KY, Heng CS, Lin Z (2013) Social media brand community and consumer behavior: Quantifying the relative impact of user- and marketer-generated content. Inform. Systems Res. 24(1):88–107.LinkGoogle Scholar
  • Greene WH (2003) Econometric Analysis (Pearson Education, Delhi, India).Google Scholar
  • Gu B, Konana P, Rajagopalan B, Chen HM (2007) Competition among virtual communities and user valuation: The case of investing-related communities. Inform. Systems Res. 18(1):68–85.LinkGoogle Scholar
  • Gu B, Konana P, Raghunathan R, Chen HM (2014) The allure of homophily in social media: Evidence from investor responses on virtual communities. Inform. Systems Res. 25(3):604–617.LinkGoogle Scholar
  • Gustafson P (2003) Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Hardin JW, Schmiediche H, Carroll RJ (2003) The simulation extrapolation method for fitting generalized linear models with additive measurement error. Stata J. 3(4):373–385.CrossrefGoogle Scholar
  • Hopkins DJ, King G (2010) A method of automated nonparametric content analysis for social science. Amer. J. Political Sci. 54(1):229–247.CrossrefGoogle Scholar
  • Huang N, Hong Y, Burtch G (2017) Social network integration and user content generation: Evidence from natural experiments. MIS Quart. 41(4):1035–1058.CrossrefGoogle Scholar
  • Huang N, Burtch G, Hong Y, Polman E (2016) Effects of multiple psychological distances on construal level: A field study of online reviews. J. Consumer Psych. 26(4):474–482.CrossrefGoogle Scholar
  • Jelveh Z, Kogut B, Naidu S (2014) Political language in economics. Working paper, New York University, New York.Google Scholar
  • Johnson SL, Safadi H, Faraj S (2015) The emergence of online community leadership. Inform. Systems Res. 26(1):165–187.LinkGoogle Scholar
  • Jurafsky D, Martin JH (2008) Speech and Language Processing (Prentice Hall, Upper Saddle River, NJ).Google Scholar
  • Küchenhoff H, Lederer W, Lesaffre E (2007) Asymptotic variance estimation for the misclassification SIMEX. Comput. Statist. Data Anal. 51(12):6197–6211.CrossrefGoogle Scholar
  • Küchenhoff H, Mwalili SM, Lesaffre E (2006) A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62(1):85–96.CrossrefGoogle Scholar
  • Lin M, Lucas HC Jr, Shmueli G (2013) Research commentary—Too big to fail: Large samples and the p-value problem. Inform. Systems Res. 24(4):906–917.LinkGoogle Scholar
  • Liu Y, Chen R, Chen Y, Mei Q, Salib S (2012) I loan because…: Understanding motivations for pro-social lending. Proc. 5th ACM Internat. Conf. Web Search Data Mining (ACM, New York),503–512.Google Scholar
  • Lu Y, Jerath K, Singh PV (2013) The emergence of opinion leaders in a networked online community: A dyadic model with time dynamics and a heuristic for fast estimation. Management Sci. 59(8):1783–1799.LinkGoogle Scholar
  • Mayzlin D, Dover Y, Chevalier J (2014) Promotional reviews: An empirical investigation of online review manipulation. Amer. Econom. Rev. 104(8):2421–2455.CrossrefGoogle Scholar
  • Moreno A, Terwiesch C (2014) Doing business with strangers: Reputation in online service marketplaces. Inform. Systems Res. 25(4):865–886.LinkGoogle Scholar
  • Mudambi SM, Schuff D (2010) What makes a helpful review? A study of customer reviews on Amazon.com. MIS Quart. 34(1):185–200.CrossrefGoogle Scholar
  • Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. Proc. ACL-02 Conf. Empirical Methods Natural Language Processing, Vol. 10 (Association for Computational Linguistics, Strousburg, PA), 79–86.Google Scholar
  • Provost F, Fawcett T (2013) Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking (O’Reilly Media, Sebastopol, CA).Google Scholar
  • Rhue L (2015) Who gets started on Kickstarter? Demographic variations in fundraising success. Proc. 36th Internat. Conf. Inform. Systems (ICIS), Fort Worth, TX, 1303–1314.Google Scholar
  • Singh PV, Sahoo N, Mukhopadhyay T (2014) How to attract and retain readers in enterprise blogging? Inform. Systems Res. 25(1):35–52.LinkGoogle Scholar
  • Stefanski LA, Cook JR (1995) Simulation-extrapolation: The measurement error jackknife. J. Amer. Statist. Assoc. 90(432):1247–1256.CrossrefGoogle Scholar
  • Tetlock PC, Saar-Tsechansky M, Macskassy S (2008) More than words: Quantifying language to measure firms’ fundamentals. J. Finance 63(3):1437–1467.CrossrefGoogle Scholar
  • Tirunillai S, Tellis GJ (2012) Does chatter really matter? Dynamics of user-generated content and stock performance. Marketing Sci. 31(2):198–215.LinkGoogle Scholar
  • Vapnik V (1995) The Nature of Statistical Learning Theory (Springer, New York).CrossrefGoogle Scholar
  • Varian H (2014) Big data: New tricks for econometrics. J. Econom. Perspect. 28(2):3–28.CrossrefGoogle Scholar
  • Wang T, Kannan KN, Ulmer JR (2013) The association between the disclosure and the realization of information security risk factors. Inform. Systems Res. 24(2):201–218.LinkGoogle Scholar
  • Wansbeek T, Meijer E (2000) Measurement Error and Latent Variables in Econometrics, Vol. 37 (North-Holland, Amsterdam).Google Scholar
  • Wu L (2013) Social network effects on productivity and job security: Evidence from the adoption of a social networking tool. Inform. Systems Res. 24(1):30–51.LinkGoogle Scholar
  • Yin D, Bond S, Zhang H (2014) Anxious or angry? Effects of discrete emotions on the perceived helpfulness of online reviews. MIS Quart. 38(2):539–560.CrossrefGoogle Scholar
  • Zhang S, Lee D, Singh P, Srinivasan K (2016) How much is an image worth? An empirical analysis of property’s image aesthetic quality on demand at AirBNB. Proc. 37th Internat. Conf. Inform. Systems (ICIS), Dublin, Ireland, 168–188.Google Scholar
  • Zhu H, Kraut R, Kittur A (2012) Effectiveness of shared leadership in online communities. Proc. ACM 2012 Conf. Comput. Supported Cooperative Work (ACM, New York), 407–416.Google Scholar
  • Zhu H, Kraut RE, Wang YC, Kittur A (2011) Identifying shared leadership in Wikipedia. Proc. SIGCHI Conf. Human Factors Comput. Systems (ACM, New York), 3431–3434.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.