Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining

Mochen Yang
Corresponding Author
Mochen Yang
[email protected]
http://orcid.org/0000-0001-5101-9041
Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455
Search for more papers by this author
,
Gediminas Adomavicius
Gediminas Adomavicius
[email protected]
Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455
Search for more papers by this author
,
Gordon Burtch
Gordon Burtch
[email protected]
Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455
Search for more papers by this author
,
Yuqing Ren
Yuqing Ren
[email protected]
Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455
Search for more papers by this author

Mochen Yang

Corresponding Author

Mochen Yang

[email protected]

http://orcid.org/0000-0001-5101-9041

Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455

Search for more papers by this author

Gediminas Adomavicius

[email protected]

Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455

Search for more papers by this author

Gordon Burtch

[email protected]

Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455

Search for more papers by this author

Yuqing Ren

[email protected]

Information and Decision Sciences, Carlson School of Management, University of Minnesota, Minneapolis, Minnesota 55455

Search for more papers by this author

Published Online:29 Jan 2018https://doi.org/10.1287/isre.2017.0727

References

Agarwal R, Dhar V (2014) Editorial—Big data, data science, and analytics: The opportunity and challenge for IS research. Inform. Systems Res. 25(3):443–448.Link, Google Scholar
Aggarwal CC (2015) Data Mining: The Textbook (Springer, Cham, Switzerland).Crossref, Google Scholar
Aggarwal R, Gopal R, Gupta A, Singh H (2012) Putting money where the mouths are: The relation between venture financing and electronic word-of-mouth. Inform. Systems Res. 23(3-part-2):976–992.Link, Google Scholar
Agrawal A, Catalini C, Goldfarb A (2014) Some simple economics of crowdfunding. Lerner J, Stern S, eds. Innovation Policy and the Economy, 1st ed., Vol. 14 (University of Chicago Press, Chicago), 63–97.Crossref, Google Scholar
Archak N, Ghose A, Ipeirotis PG (2011) Deriving the pricing power of product features by mining consumer reviews. Management Sci. 57(8):1485–1509.Link, Google Scholar
Bao Y, Datta A (2014) Simultaneously discovering and quantifying risk types from textual risk disclosures. Management Sci. 60(6):1371–1391.Link, Google Scholar
Buonaccorsi JP, Laake P, Veierød MB (2005) On the effect of misclassification on bias of perfectly measured covariates in regression. Biometrics 61(3):831–836.Crossref, Google Scholar
Burtch G, Ghose A, Wattal S (2013) An empirical examination of the antecedents and consequences of contribution patterns in crowd-funded markets. Inform. Systems Res. 24(3):499–519.Link, Google Scholar
Burtch G, Ghose A, Wattal S (2015) The hidden cost of accommodating crowdfunder privacy preferences: A randomized field experiment. Management Sci. 61(5):949–962.Link, Google Scholar
Carroll RJ, Küchenhoff H, Lombard F, Stefanski LA (1996) Asymptotics for the SIMEX estimator in nonlinear measurement error models. J. Amer. Statist. Assoc. 91(433):242–250.Crossref, Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement Error in Nonlinear Models: A Modern Perspective (CRC Press, Boca Raton, FL).Crossref, Google Scholar
Chan J, Wang J (2014) Hiring biases in online labor markets: The case of gender stereotyping. Proc. 35th Internat. Conf. Inform. Systems (ICIS), Auckland, NZ, 1161–1178.Google Scholar
Chen H, Chiang RH, Storey VC (2012) Business intelligence and analytics: From big data to big impact. MIS Quart. 36(4):1165–1188.Crossref, Google Scholar
Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc. 89(428):1314–1328.Crossref, Google Scholar
Das SR, Chen MY (2007) Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Sci. 53(9):1375–1388.Link, Google Scholar
Dellarocas C (2003) The digitization of word of mouth: Promise and challenges of online feedback. Management Sci. 49(10):1407–1424.Link, Google Scholar
Fisher IE, Garnsey MR, Hughes ME (2016) Natural language processing in accounting, auditing and finance: A synthesis of the literature with a roadmap for future research. Intelligent Systems Accounting, Finance Management 23(3):157–214.Crossref, Google Scholar
Forman C, Ghose A, Wiesenfeld B (2008) Examining the relationship between reviews and sales: The role of reviewer identity disclosure in electronic markets. Inform. Systems Res. 19(3):291–313.Link, Google Scholar
Ghose A, Ipeirotis PG (2011) Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics. Knowledge Data Engrg., IEEE Trans. 23(10):1498–1512.Crossref, Google Scholar
Ghose A, Ipeirotis PG, Li B (2012) Designing ranking systems for hotels on travel search engines by mining user-generated and crowdsourced content. Marketing Sci. 31(3):493–520.Link, Google Scholar
Gleser LJ (1990) Improvements of the naive approach to estimation in nonlinear errors-in-variables regression models. Contemporary Math. 112:99–114.Crossref, Google Scholar
Godes D, Mayzlin D (2004) Using online conversations to study word-of-mouth communication. Marketing Sci. 23(4):545–560.Link, Google Scholar
Goh KY, Heng CS, Lin Z (2013) Social media brand community and consumer behavior: Quantifying the relative impact of user- and marketer-generated content. Inform. Systems Res. 24(1):88–107.Link, Google Scholar
Greene WH (2003) Econometric Analysis (Pearson Education, Delhi, India).Google Scholar
Gu B, Konana P, Rajagopalan B, Chen HM (2007) Competition among virtual communities and user valuation: The case of investing-related communities. Inform. Systems Res. 18(1):68–85.Link, Google Scholar
Gu B, Konana P, Raghunathan R, Chen HM (2014) The allure of homophily in social media: Evidence from investor responses on virtual communities. Inform. Systems Res. 25(3):604–617.Link, Google Scholar
Gustafson P (2003) Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments (CRC Press, Boca Raton, FL).Crossref, Google Scholar
Hardin JW, Schmiediche H, Carroll RJ (2003) The simulation extrapolation method for fitting generalized linear models with additive measurement error. Stata J. 3(4):373–385.Crossref, Google Scholar
Hopkins DJ, King G (2010) A method of automated nonparametric content analysis for social science. Amer. J. Political Sci. 54(1):229–247.Crossref, Google Scholar
Huang N, Hong Y, Burtch G (2017) Social network integration and user content generation: Evidence from natural experiments. MIS Quart. 41(4):1035–1058.Crossref, Google Scholar
Huang N, Burtch G, Hong Y, Polman E (2016) Effects of multiple psychological distances on construal level: A field study of online reviews. J. Consumer Psych. 26(4):474–482.Crossref, Google Scholar
Jelveh Z, Kogut B, Naidu S (2014) Political language in economics. Working paper, New York University, New York.Google Scholar
Johnson SL, Safadi H, Faraj S (2015) The emergence of online community leadership. Inform. Systems Res. 26(1):165–187.Link, Google Scholar
Jurafsky D, Martin JH (2008) Speech and Language Processing (Prentice Hall, Upper Saddle River, NJ).Google Scholar
Küchenhoff H, Lederer W, Lesaffre E (2007) Asymptotic variance estimation for the misclassification SIMEX. Comput. Statist. Data Anal. 51(12):6197–6211.Crossref, Google Scholar
Küchenhoff H, Mwalili SM, Lesaffre E (2006) A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62(1):85–96.Crossref, Google Scholar
Lin M, Lucas HC Jr, Shmueli G (2013) Research commentary—Too big to fail: Large samples and the p-value problem. Inform. Systems Res. 24(4):906–917.Link, Google Scholar
Liu Y, Chen R, Chen Y, Mei Q, Salib S (2012) I loan because…: Understanding motivations for pro-social lending. Proc. 5th ACM Internat. Conf. Web Search Data Mining (ACM, New York),503–512.Google Scholar
Lu Y, Jerath K, Singh PV (2013) The emergence of opinion leaders in a networked online community: A dyadic model with time dynamics and a heuristic for fast estimation. Management Sci. 59(8):1783–1799.Link, Google Scholar
Mayzlin D, Dover Y, Chevalier J (2014) Promotional reviews: An empirical investigation of online review manipulation. Amer. Econom. Rev. 104(8):2421–2455.Crossref, Google Scholar
Moreno A, Terwiesch C (2014) Doing business with strangers: Reputation in online service marketplaces. Inform. Systems Res. 25(4):865–886.Link, Google Scholar
Mudambi SM, Schuff D (2010) What makes a helpful review? A study of customer reviews on Amazon.com. MIS Quart. 34(1):185–200.Crossref, Google Scholar
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentiment classification using machine learning techniques. Proc. ACL-02 Conf. Empirical Methods Natural Language Processing, Vol. 10 (Association for Computational Linguistics, Strousburg, PA), 79–86.Google Scholar
Provost F, Fawcett T (2013) Data Science for Business: What You Need to Know About Data Mining and Data-Analytic Thinking (O’Reilly Media, Sebastopol, CA).Google Scholar
Rhue L (2015) Who gets started on Kickstarter? Demographic variations in fundraising success. Proc. 36th Internat. Conf. Inform. Systems (ICIS), Fort Worth, TX, 1303–1314.Google Scholar
Singh PV, Sahoo N, Mukhopadhyay T (2014) How to attract and retain readers in enterprise blogging? Inform. Systems Res. 25(1):35–52.Link, Google Scholar
Stefanski LA, Cook JR (1995) Simulation-extrapolation: The measurement error jackknife. J. Amer. Statist. Assoc. 90(432):1247–1256.Crossref, Google Scholar
Tetlock PC, Saar-Tsechansky M, Macskassy S (2008) More than words: Quantifying language to measure firms’ fundamentals. J. Finance 63(3):1437–1467.Crossref, Google Scholar
Tirunillai S, Tellis GJ (2012) Does chatter really matter? Dynamics of user-generated content and stock performance. Marketing Sci. 31(2):198–215.Link, Google Scholar
Vapnik V (1995) The Nature of Statistical Learning Theory (Springer, New York).Crossref, Google Scholar
Varian H (2014) Big data: New tricks for econometrics. J. Econom. Perspect. 28(2):3–28.Crossref, Google Scholar
Wang T, Kannan KN, Ulmer JR (2013) The association between the disclosure and the realization of information security risk factors. Inform. Systems Res. 24(2):201–218.Link, Google Scholar
Wansbeek T, Meijer E (2000) Measurement Error and Latent Variables in Econometrics, Vol. 37 (North-Holland, Amsterdam).Google Scholar
Wu L (2013) Social network effects on productivity and job security: Evidence from the adoption of a social networking tool. Inform. Systems Res. 24(1):30–51.Link, Google Scholar
Yin D, Bond S, Zhang H (2014) Anxious or angry? Effects of discrete emotions on the perceived helpfulness of online reviews. MIS Quart. 38(2):539–560.Crossref, Google Scholar
Zhang S, Lee D, Singh P, Srinivasan K (2016) How much is an image worth? An empirical analysis of property’s image aesthetic quality on demand at AirBNB. Proc. 37th Internat. Conf. Inform. Systems (ICIS), Dublin, Ireland, 168–188.Google Scholar
Zhu H, Kraut R, Kittur A (2012) Effectiveness of shared leadership in online communities. Proc. ACM 2012 Conf. Comput. Supported Cooperative Work (ACM, New York), 407–416.Google Scholar
Zhu H, Kraut RE, Wang YC, Kittur A (2011) Identifying shared leadership in Wikipedia. Proc. SIGCHI Conf. Human Factors Comput. Systems (ACM, New York), 3431–3434.Google Scholar

cover image Information Systems Research

Volume 29, Issue 1

March 2018

Pages iii-vi, 1-251

Article Information

Supplemental Material

Metrics

Information

Received:October 06, 2015
Accepted:April 24, 2017
Published Online:January 29, 2018

Cite as

Mochen Yang, Gediminas Adomavicius, Gordon Burtch, Yuqing Ren (2018) Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining. Information Systems Research 29(1):4-24.

https://doi.org/10.1287/isre.2017.0727

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Mind the Gap: Accounting for Measurement Error and Misclassification in Variables Generated via Data Mining

References

Volume 29, Issue 1

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News