Learning from Crowdsourced Multi-labeling: A Variational Bayesian Approach

Published Online:

References

  • Antoniak CE (1974) Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2(6):1152–1174.CrossrefGoogle Scholar
  • Bishop CM (2006) Pattern Recognition and Machine Learning (Springer, Berlin).Google Scholar
  • Blei DM, Jordan MI (2006) Variational inference for Dirichlet process mixtures. Bayesian Anal. 1(1):121–144.CrossrefGoogle Scholar
  • Blei DM, Kucukelbir A, McAuliffe JD (2017) Variational inference: A review for statisticians. J. Amer. Statist. Assoc. 112(518):859–877.CrossrefGoogle Scholar
  • Brabham DC (2013) Crowdsourcing (MIT Press, Cambridge, MA).CrossrefGoogle Scholar
  • Bragg J, Mausam, Weld DS (2013) Crowdsourcing multi-label classification for taxonomy creation. Hartman B, Horvitz E, eds. Proc. 1st AAAI Conf. Human Comput. Crowdsourcing (AAAI Press, Palo Alto, CA), 25–33.Google Scholar
  • Cartwright M, Dove G, Méndez Méndez AE, Bello JP, Nov O (2019) Crowdsourcing multi-label audio annotation tasks with citizen scientists. Brewster SA, Fitzpatrick G, Cox AL, Kostakos V, eds. Proc. 2019 CHI Conf. Human Factors in Comput. Systems (ACM, New York), 1–11.Google Scholar
  • Chen BX (2018) Tech can hurt our sleep. so I tried Bose sleepbuds for help. Accessed September 5, 2018, https://www.nytimes.com/2018/09/05/technology/personaltech/tech-sleep-bose-sleepbuds.html.Google Scholar
  • Chua TS, Tang J, Hong R, Li H, Luo Z, Zheng Y (2009) NUS-WIDE: A real-world web image database from National University of Singapore. Proc. ACM Internat. Conf. Image Video Retrieval, 48:1–48:9.Google Scholar
  • Dalvi N, Dasgupta A, Kumar R, Rastogi V (2013) Aggregating crowdsourced binary ratings. Schwabe D, Almeida VAF, Glaser H, Baeza-Yates R, Moon SB, eds. Proc. 22nd Internat. Conf. World Wide Web (ACM, New York), 285–294.Google Scholar
  • Dawid AP, Skene AM (1979) Maximum likelihood estimation of observer error-rates using the EM algorithm. J. Royal Statist. Soc. Series C Appl. Statist. 28(1):20–28.CrossrefGoogle Scholar
  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. B 39(1):1–38.CrossrefGoogle Scholar
  • Deng J, Russakovsky O, Krause J, Bernstein MS, Berg A, Fei-Fei L (2014) Scalable multi-label annotation. Jones M, Palanque PA, Schmidt A, Grossman T, eds. Proc. SIGCHI Conf. Human Factors Comput. Systems (ACM, New York), 3099–3102.Google Scholar
  • Duan L, Oyama S, Sato H, Kurihara M (2014) Separate or joint? Estimation of multiple labels from crowdsourced annotations. Expert Systems Appl. 41(13):5723–5732.CrossrefGoogle Scholar
  • Escobar MD, West M (1995) Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90:577–588.CrossrefGoogle Scholar
  • Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann. Statist. 1(2):209–230.CrossrefGoogle Scholar
  • Gadiraju U, Kawase R, Dietze S, Demartini G (2015) Understanding malicious behavior in crowdsourcing platforms: The case of online surveys. Begole B, Kim J, Woo W, eds. Proc. 33rd Annual ACM Conf. Human Factors Comput. Systems (ACM, New York), 1631–1640.Google Scholar
  • Ghosh A, Kale S, McAfee P (2011) Who moderates the moderators?: Crowdsourcing abuse detection in user-generated content. Shoham Y, Chen Y, Roughgarden T, eds. Proc. 12th ACM Conf. Electronic Commerce (ACM, New York), 167–176.Google Scholar
  • Ho CJ, Vaughan JW (2012) Online task assignment in crowdsourcing markets. Hoffmann J, Selman B, eds. Proc. 26th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 45–51.Google Scholar
  • Ho CJ, Slivkins A, Suri S, Vaughan JW (2015) Incentivizing high quality crowdwork. Gangemi A, Leonardi S, Panconesi A, eds. Proc. 24th Internat. Conf. World Wide Web, (ACM, New York)419–429.Google Scholar
  • Hoffman MD, Gelman A (2014) The No-U-Turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Machine Learning Res. 15(47):1593–1623.Google Scholar
  • Horton JJ, Chilton LB (2010) The labor economics of paid crowdsourcing. Parkes DC, Dellarocas, Tennenholtz M, eds. Proc. 11th ACM Conf. Electronic Commerce (ACM, New York), 209–218.Google Scholar
  • Hung NQV, Viet HH, Tam NT, Weidlich M, Yin H, Zhou X (2018) Computing crowd consensus with partial agreement. IEEE Trans. Knowledge Data Engrg. 30(1):1–14.CrossrefGoogle Scholar
  • Ipeirotis PG, Provost F, Wang J (2010) Quality management on Amazon Mechanical Turk. Chandrasekar R, Chi E, Chickering M, Ipeirotis PG, Mason W, Provost F, Tam J, von Ahn L, eds. Proc. ACM SIGKDD Workshop Human Comput. (ACM, New York), 64–67.Google Scholar
  • Ipeirotis PG, Provost F, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Mining Knowledge Discovery 28(2):402–441.CrossrefGoogle Scholar
  • Ishiguro K, Sato I, Ueda N (2017) Averaged collapsed variational Bayes inference. J. Machine Learning Res. 18(1):1–29.Google Scholar
  • Jain H, Padmanabhan B, Pavlou PA, Santanam RT (2018) Call for papers—Special issue of information systems research—Humans, algorithms, and augmented intelligence: The future of work, organizations, and society. Inform. Systems Res. 29(1):250–251.LinkGoogle Scholar
  • Joe H (1997) Multivariate Models and Multivariate Dependence Concepts (CRC Press, Boca Raton, FL).CrossrefGoogle Scholar
  • Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK (1999) An introduction to variational methods for graphical models. Machine Learning 37(2):183–233.CrossrefGoogle Scholar
  • Juusola JL, Quisel TR, Foschini L, Ladapo JA (2016) The impact of an online crowdsourcing diagnostic tool on healthcare utilization: A case study using a novel approach to retrospective claims analysis. J. Medical Internet Res. 18(6):e127.CrossrefGoogle Scholar
  • Karger DR, Oh S, Shah D (2014) Budget-optimal task allocation for reliable crowdsourcing systems. Oper. Res. 62(1):1–24.LinkGoogle Scholar
  • Kazai G, Kamps J, Milic-Frayling N (2011) Worker types and personality traits in crowdsourcing relevance labels. Macdonald C, Ounis I, Ruthven I, eds. Proc. 20th ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 1941–1944.Google Scholar
  • Kim HC, Ghahramani Z (2012) Bayesian classifier combination. Lawrence ND, Girolami MA Proc. 15th Internat. Conf. Artificial Intelligence Statist. (JMLR), 619–627.Google Scholar
  • Kovashka A, Russakovsky O, Fei-Fei L, Grauman K (2016) Crowdsourcing in computer vision. Foundations Trends Comput. Graphic Vision 10(3):177–243.CrossrefGoogle Scholar
  • Kucukelbir A, Tran D, Ranganath R, Gelman A, Blei DM (2017) Automatic differentiation variational inference. J. Machine Learning Res. 18(14):1–45.Google Scholar
  • Kurihara K, Welling M, Teh YW (2007) Collapsed variational Dirichlet process mixture models. Veloso MM, ed. Proc. Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 2796–2801.Google Scholar
  • Lazarsfeld PF, Henry NW (1968) Latent Structure Analysis (Houghton Mifflin, Boston).Google Scholar
  • Li C, Wang B, Pavlu V, Aslam J (2016) Conditional Bernoulli mixtures for multi-label classification. Balcan M-F, Weinberger KQ, eds. Proc. 33rd Internat. Conf. Machine Learning (JMLR.org), 2482–2491.Google Scholar
  • Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. Proc. 2015 IEEE Internat. Conf. Comput. Vision (ICCV) (IEEE, Washington, DC), 3730–3738.Google Scholar
  • Luca M, Zervas G (2016) Fake it till you make it: Reputation, competition, and yelp review fraud. Management Sci. 62(12):3412–3427.LinkGoogle Scholar
  • Magistretti B (2017) Playment raises $1.6 million to improve AI training through crowdsourced data tagging. Accessed November 21, 2017, https://venturebeat.com/2017/11/21/playment-raises-1-6-million-to-improve-ai-training-through-crowdsourced-data-tagging.Google Scholar
  • Marge M, Banerjee S, Rudnicky AI (2010) Using the Amazon Mechanical Turk for transcription of spoken language. IEEE Internat. Conf. Acoustics Speech Signal Processing (IEEE, Washington, DC), 5270–5273.Google Scholar
  • Minka T (2000) Estimating a Dirichlet distribution. Technical report, Massachusetts Institute of Technology, Cambridge, MA.Google Scholar
  • Moreno PG, Artes-Rodriguez A, Teh YW, Perez-Cruz F (2015) Bayesian nonparametric crowdsourcing. J. Machine Learning Res. 16:1607–1627.Google Scholar
  • Nowak S, Rüger S (2010) How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. Wang JZ, Boujemaa N, Ramirez NO, Natsev A, eds. Proc. Internat. Conf. Multimedia Inform. Retrieval (ACM, New York), 557–566.Google Scholar
  • Padmanabhan D, Bhat S, Shevade S, Narahari Y (2016) Topic model based multi-label classification. IEEE 28th Internat. Conf. Tools Artificial Intelligence (IEEE, Washington, DC), 996–1003.Google Scholar
  • Pavlick E, Post M, Irvine A, Kachaev D, Callison-Burch C (2014) The language demographics of Amazon Mechanical Turk. Trans. Assoc. Comput. Linguistics 2:79–92.CrossrefGoogle Scholar
  • Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J. Machine Learning Res. 11:1297–1322.Google Scholar
  • Ritter A, Clark S, Etzioni O (2011) Named entity recognition in tweets: an experimental study. Proc. Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1524–1534.Google Scholar
  • Sethuraman J (1994) A constructive definition of Dirichlet priors. Statist. Sinica 4(2):639–650.Google Scholar
  • Snow R, O’Connor B, Jurafsky D, Ng AY (2008) Cheap and fast—but is it good?: Evaluating non-expert annotations for natural language tasks. Proc. Conf. Empirical Methods Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 254–263.Google Scholar
  • Strapparava C, Mihalcea R (2007) Semeval-2007 task 14: Affective text. Agirre E, Marquez i Villodre L, Wicentowski R, eds. Proc. 4th Internat. Workshop Semantic Evaluations (Association for Computational Linguistics, Stroudsburg, PA), 70–74.Google Scholar
  • Teh YW, Newman D, Welling M (2007) A collapsed variational bayesian inference algorithm for latent Dirichlet allocation. Scholkopf B, Platt JC, Hofmann T, eds. Adv. Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 1353–1360.Google Scholar
  • Trohidis K, Tsoumakas G, Kalliris G, Vlahavas IP (2011) Multi-label classification of music by emotion. EURASIP J. Audio Speech Music Processing 1:4.CrossrefGoogle Scholar
  • Tsoumakas G, Katakis I (2007) Multi-label classification: An overview. Internat. J. Data Warehouse Mining 3(3):1–13.CrossrefGoogle Scholar
  • Vaughan JW (2018) Making better use of the crowd: How crowdsourcing can advance machine learning research. J. Machine Learning Res. 18(193):1–46.Google Scholar
  • Venanzi M, Teacy WTL, Rogers A, Jennings NR (2015) Bayesian modelling of community-based multidimensional trust in participatory sensing under data sparsity. Yang Q, Woolridge MJ, eds. Proc. 24th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 717–724.Google Scholar
  • Venanzi M, Guiver J, Kazai G, Kohli P, Shokouhi M (2014) Community-based Bayesian aggregation models for crowdsourcing. Chung CW, Broder AZ, Shim K, Suel T, eds. Proc. 23rd Internat. Conf. World Wide Web (ACM, New York), 155–164.Google Scholar
  • Wainwright MJ, Jordan MI (2008) Graphical models, exponential families, and variational inference. Foundations Trends Machine Learning 1(1–2):1–305.Google Scholar
  • Wang J, Ipeirotis PG, Provost F (2017a) Cost-effective quality assurance in crowd labeling. Inform. Systems Res. 28(1):137–158.LinkGoogle Scholar
  • Wang S, Li X, Chang X, Yao L, Sheng QZ, Long G (2017b) Learning multiple diagnosis codes for ICU patients with local disease correlation mining. ACM Trans. Knowledge Discovery Data 11(3):31:1–31:21.Google Scholar
  • Weld DS, Adar E, Chilton L, Hoffmann R, Horvitz E, Koch M, Landay J, et al. (2012) Personalized online education: A crowdsourcing challenge. Chen Y, Ipeirotis PG, Law E, von Ahn L, Zhang H, eds. The 4th Human Comput. Workshop 26th AAAI Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 159–163.Google Scholar
  • Welinder P, Branson S, Belongie S, Perona P (2010) The multidimensional wisdom of crowds. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Adv. Neural Inform. Processing Systems (Curran Associates, Inc., Red Hook, New York), 2424–2432.Google Scholar
  • Whitehill J, Ruvolo P, Wu T, Bergsma J, Movellan J (2009) Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A, eds. Adv. Neural Inf. Process. Syst, 2035–2043.Google Scholar
  • Zhang J, Wu X (2018) Multi-label inference for crowdsourcing. Guo Y, Farooq F, eds. Proc. 24th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 2738–2747.Google Scholar
  • Zhang M, Zhou Z (2013) A review on multi-label learning algorithms. IEEE Trans. Knowledge Data Engrg. 26(8):1819–1837.CrossrefGoogle Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.