Cost-Effective Quality Assurance in Crowd Labeling

Published Online:https://doi.org/10.1287/isre.2016.0661

References

  • Adomavicius G, Gupta A, Zhdanov D (2009) Designing intelligent software agents for auctions with limited information feedback. Inform. Systems Res. 20(4):507–526.LinkGoogle Scholar
  • Aperjis C, Johari R (2010) Optimal windows for aggregating ratings in electronic marketplaces. Management Sci. 56(5):864–880.LinkGoogle Scholar
  • Archak N, Ghose A, Ipeirotis PG (2011) Deriving the pricing power of product features by mining consumer reviews. Management Sci. 57(8):1485–1509.LinkGoogle Scholar
  • Bachrach Y, Graepel T, Minka T, Guiver J (2012) How to grade a test without knowing the answers—A Bayesian graphical model for adaptive crowdsourcing and aptitude testing. Proc. 29th Internat. Conf. Machine Learning (Omnipress, Madison, WI), 1183–1190.Google Scholar
  • Berger RL (1982) Multiparameter hypothesis testing and acceptance sampling. Technometrics 24(4):295–300.CrossrefGoogle Scholar
  • Carpenter B (2008) Multilevel Bayesian models of categorical data annotation. https://lingpipe.files.wordpress.com/2008/11/carp-bayesian-multilevel-annotation.pdf.Google Scholar
  • Chen X, Lin Q, Zhou D (2015) Statistical decision making for optimal budget allocation in crowd labeling. J. Machine Learning Res. 16:1–46.Google Scholar
  • Chiang IR, Mookerjee VS (2004) A fault threshold policy to manage software development projects. Inform. Systems Res. 15(1):3–21.LinkGoogle Scholar
  • Christoforaki M, Ipeirotis P (2014) STEP: A scalable testing and evaluation platform. Second AAAI Conf. Human Comput. Crowdsourcing (AAAI Press, Palo Alto, CA), 41–49.Google Scholar
  • Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Machine Learning 15(2):201–221.CrossrefGoogle Scholar
  • Crisan D, Doucet A (2002) A survey of convergence results on particle filtering methods for practitioners. IEEE Trans. Signal Processing 50(3):736–746.CrossrefGoogle Scholar
  • Crocker L, Algina J (2006) Introduction to Classical and Modern Test Theory (Wadsworth, Belmont, CA).Google Scholar
  • Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. Appl. Statist. 28(1):20–28.CrossrefGoogle Scholar
  • DeMars C (2010) Item Response Theory (Oxford University Press, Oxford, UK).CrossrefGoogle Scholar
  • Donmez P, Carbonell J, Schneider J (2010) A probabilistic framework to learn from multiple annotators with time-varying accuracy. Proc. 10th SIAM Internat. Conf. Data Mining (SDM) (SIAM, Philadelphia), 826–837.Google Scholar
  • Goes PB (2014) Editor’s comments: Design science research in top information systems journals. MIS Quart. 38(1):iii–viii.Google Scholar
  • Gregor S, Hevner AR (2013) Positioning and presenting design science research for maximum impact. MIS Quart. 37(2):337–356.CrossrefGoogle Scholar
  • Hevner AR, March ST, Park J, Ram S (2004) Design science in information systems research. MIS Quart. 28(1):75–105.CrossrefGoogle Scholar
  • Ho CJ, Slivkins A, Suri S, Vaughan JW (2015) Incentivizing high quality crowdwork. Proc. 24th Internat. Conf. World Wide Web (ACM, New York), 419–429.CrossrefGoogle Scholar
  • Ipeirotis PG (2010) Analyzing the Amazon Mechanical Turk marketplace. XRDS: Crossroads, ACM Magazine Students 17(2):16–21.CrossrefGoogle Scholar
  • Ipeirotis PG, Provost F, Wang J (2010) Quality management on Amazon Mechanical Turk. Proc. ACM SIGKDD Workshop on Human Comput. (ACM, New York), 64–67.CrossrefGoogle Scholar
  • Ipeirotis PG, Provost F, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Mining Knowledge Discovery 28(2):402–441.CrossrefGoogle Scholar
  • Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. Proc. 24th Internat. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 1953–1961.Google Scholar
  • Ketter W, Collins J, Gini M, Gupta A, Schrater P (2012) Real-time tactical and strategic sales management for intelligent agents guided by economic regimes. Inform. Systems Res. 23(4):1263–1283.LinkGoogle Scholar
  • Kokkodis M, Ipeirotis PG (2015) Reputation transferability in online labor markets. Management Sci. 62(6):1687–1706.LinkGoogle Scholar
  • Kuechler W, Vaishnavi V (2012) A framework for theory development in design science research: Multiple perspectives. J. Assoc. Inform. Systems 13(6):395–423.Google Scholar
  • Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. Proc. 17th Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (Springer, New York), 3–12.CrossrefGoogle Scholar
  • Lizotte DJ, Madani O, Greiner R (2003) Budgeted learning of naive-Bayes classifiers. Proc. 19th Conf. Uncertainty Artificial Intelligence (Morgan Kaufmann, San Francisco), 378–385.Google Scholar
  • Malone TW, Laubacher R, Dellarocas C (2009) Harnessing crowds: Mapping the genome of collective intelligence. Working paper, Massachusetts Institute of Technology, Cambridge, http://ssrn.com/abstract=1381502.Google Scholar
  • March ST, Storey VC (2008) Design science in the information systems discipline: An introduction to the special issue on design science research. MIS Quart. 32(4):725–730.CrossrefGoogle Scholar
  • Moore JC, Whinston AB (1986) A model of decision-making with sequential information-acquisition (part 1). Decision Support Systems 2(4):285–307.CrossrefGoogle Scholar
  • Moore JC, Whinston AB (1987) A model of decision-making with sequential information-acquisition (part 2). Decision Support Systems 3(1):47–72.CrossrefGoogle Scholar
  • Moreno A, Terwiesch C (2014) Doing business with strangers: Reputation in online service marketplaces. Inform. Systems Res. 25(4):865–886.LinkGoogle Scholar
  • Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J. Machine Learning Res. 11(April):1297–1322.Google Scholar
  • Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. Proc. 18th Internat. Conf. Machine Learning (Morgan Kaufmann, San Francisco),441–448.Google Scholar
  • Saar-Tsechansky M, Provost F (2004) Active sampling for class probability estimation and ranking. Machine Learning 54(2):153–178.CrossrefGoogle Scholar
  • Saar-Tsechansky M, Provost F (2007) Decision-centric active learning of binary-outcome models. Inform. Systems Res. 18(1):4–22.LinkGoogle Scholar
  • Saar-Tsechansky M, Melville P, Provost F (2009) Active feature-value acquisition. Management Sci. 55(4):664–684.LinkGoogle Scholar
  • Schilling EG (1982) Acceptance Sampling in Quality Control (CRC Press, Boca Raton, FL).Google Scholar
  • Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. Proc. 14th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 614–622.CrossrefGoogle Scholar
  • Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—But is it good? Evaluating non-expert annotations for natural language tasks. Proc. Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 254–263.Google Scholar
  • Wais P, Lingamneni S, Cook D, Fennell J, Goldenberg B, Lubarov D, Marin D, Simons H (2010) Towards building a high-quality workforce with Mechanical Turk. Proc. NIPS Workshop Comput. Soc. Sci. Wisdom Crowds (Curran Associates, Red Hook, NY), 1–5.Google Scholar
  • Wang J, Ghose A, Ipeirotis P (2012) Bonus, disclosure, and choice: What motivates the creation of high-quality paid reviews? Proc. 33rd Internat. Conf. Inform. Systems (AIS, Atlanta).Google Scholar
  • Welinder P, Perona P (2010) Online crowdsourcing: Rating annotators and obtaining cost-effective labels. 2010 IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognition-Workshops (IEEE, New York), 25–32.CrossrefGoogle Scholar
  • Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Proc. 23rd Internat. Conf. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY),2424–2432.Google Scholar
  • Wetherill GB, Chiu WK (1975) A review of acceptance sampling schemes with emphasis on the economic aspect. Internat. Statist. Rev. 43(2):191–210.CrossrefGoogle Scholar
  • Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Proc. 22nd Internat. Conf. Adv. Neural Inform. Processing Systems (Curran Associates, Red Hook, NY), 2035–2043.Google Scholar
  • Zheng Z, Padmanabhan B (2006) Selectively acquiring customer information: A new data acquisition problem and an active learning-based solution. Management Sci. 52(5):697–712.LinkGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.