Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective

Kaiquan Xu
Kaiquan Xu
[email protected]
Marketing and eBusiness Department, School of Business, Nanjing University, Nanjing 210093, China
Search for more papers by this author
,
Stephen Shaoyi Liao
Stephen Shaoyi Liao
[email protected]
Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong
Search for more papers by this author
,
Raymond Y. K. Lau
Raymond Y. K. Lau
[email protected]
Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong
Search for more papers by this author
,
J. Leon Zhao
J. Leon Zhao
[email protected]
Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong
Search for more papers by this author

Kaiquan Xu

[email protected]

Marketing and eBusiness Department, School of Business, Nanjing University, Nanjing 210093, China

Search for more papers by this author

Stephen Shaoyi Liao

[email protected]

Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong

Search for more papers by this author

Raymond Y. K. Lau

[email protected]

Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong

Search for more papers by this author

J. Leon Zhao

[email protected]

Department of Information Systems, City University of Hong Kong, Kowloon, Hong Kong

Search for more papers by this author

Published Online:28 Feb 2014https://doi.org/10.1287/ijoc.2013.0578

References

Attenberg J, Melville P, Provost F, Saar-Tsechansky M (2011) Selective data acquisition. Krishnapuram B, Yu S, Rao RB, eds. Cost-Sensitive Machine Learning (CRC Press, Boca Raton, FL),101–155.Google Scholar
Berry MW, Castellanos M (2007) Survey of Text Mining: Clustering, Classification, and Retrieval (Springer-Verlag, New York).Google Scholar
Bhargava HK, Krishnan R (1998) The World Wide Web: Opportunities for operations research and management science. INFORMS J. Comput. 10(4):359–383.Link, Google Scholar
Birlutiu A, Groot P, Heskes T (2012) Efficiently learning the preferences of people. Machine Learn. 10(4):1–28.Google Scholar
Bordes A, Ertekin S, Weston J, Bottou O (2005) Fast kernel classifiers with online and active learning. J. Machine Learn. Res. 6:1579–1619.Google Scholar
Carrizosa E, Mart-Barrag B, Moralesc D (2011) Detecting relevant variables and interactions in supervised classification. Eur. J. Oper. Res. 213(1):260–269.Crossref, Google Scholar
Cohn D, Atlas L, Ladner R (1994) Improving generalization with active learning. Machine Learn. 15(2):201–221.Crossref, Google Scholar
Culotta A, McCallum A (2005) Reducing labeling effort for structured prediction tasks. Proc. 20th National Conf. Artificial Intelligence (AAAI Press, Menlo Park, CA), 746–751.Crossref, Google Scholar
Cunningham H, Maynard D, Bontcheva K, Tablan V (2002) GATE: A framework and geographical development environment for robust NLP tools and applications. Proc. 40th Anniversary Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA).Google Scholar
Das SR, Chen MY (2007) Yahoo! for Amazon: Sentiment extraction from small talk on the Web. Management Sci. 53(9):1375–1388.Link, Google Scholar
Druck G, Mann G, McCallum A (2008) Learning from labeled features using generalized expectation criteria. Proc. 31st Annual Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM Press, New York), 595–602.Crossref, Google Scholar
Erik F, Tjong KS (2002) Introduction to the CONLL-2002 shared task: Language-independent named entity recognition. Proc. Sixth Conf. Natural Language Learn. (Association for Computational Linguistics, Stroudsburg, PA), 155–158.Google Scholar
Escudeiro N, Jorge A (2010) D-Confidence: An active learning strategy which efficiently identifies small classes. Proc. NAACL HLT 2010 Workshop on Active Learn. Natural Language Processing, (Association for Computational Linguistics, Stroudsburg, PA), 18–26.Google Scholar
Evgeniou T, Pontil M, Poggio T (2000) Statistical learning theory: A primer. Internat. J. Comput. Vision 38(1):9–13.Crossref, Google Scholar
Fan W, Michael GD, Pathak P (2005) Genetic programming-based discovery of ranking functions for effective Web search. J. Management Inform. Systems 21(4):37–56.Crossref, Google Scholar
Farhoomand AF, Drury DH (2002) Managerial information overload. Comm. ACM 45(10):127–131.Crossref, Google Scholar
Greiner R, Grove AJ, Roth D (2002) Learning cost-sensitive active classifiers. Artificial Intelligence 139(2):137–174.Crossref, Google Scholar
Grilheres B, Beauce C, Canu S, Brunessaux S (2005) Benchmarking of semantic annotation with conditional random fields. Proc. 2th Eur. Workshop on the Integration of Knowledge, Semantic and Digital Media Technologies (Institution of Electrical Engineers, London), 233–236.Crossref, Google Scholar
Guillory A, Chastain E, Bilmes J (2009) Active learning as non-convex optimization. J. Machine Learn. Res. 5:201–208.Google Scholar
Haldun A, Serpil S (2009) Using support vector machines to learn the efficient set in multiple objective discrete optimization. Eur. J. Oper. Res. 193(2):510–519.Crossref, Google Scholar
Hotho A, Nürnberger A, Paaß G (2005) A brief survey of text mining. J. Comput. Linguistics Language Tech. 20(1):19–62.Crossref, Google Scholar
Ikeda K, Yamasaki T (2007) Incremental support vector machines and their geometrical analyses. Neurocomputing 70(13–15):2528–2533.Crossref, Google Scholar
Ingo S, Andreas C (2008) Support Vector Machines (Springer-Verlag, New York).Google Scholar
Jang H, Song SK, Myaeng SH (2006) Text mining for medical documents using a hidden Markov model. Proc. 3rd Asia Conf. Inform. Retrieval Tech. (Springer-Verlag, Berlin), 553–559.Crossref, Google Scholar
Ji S, Carin L (2007) Cost-sensitive feature acquisition and classification. Pattern Recognition 40(5):1474–1485.Crossref, Google Scholar
Joachims T (1999) Making large-scale support vector machine learning practical. Schölkopf B, Burges JC, Smola AJ, eds. Advances in Kernel Methods: Support Vector Learning (MIT Press, Cambridge, MA), 169–184.Google Scholar
Joachims T (2008) SVM-struct: Support vector machine for complex outputs. Accessed January 15, 2011, http://svmlight.joachims.org/svm_struct.html.Google Scholar
Khemchandani R, Jayadeva, Chandra S (2009) Knowledge based proximal support vector machines. Eur. J. Oper. Res. 195(3):914–923.Crossref, Google Scholar
Kim S, Song Y, Kim K, Cha J, Lee G (2006) MMR-based active machine learning for bio named entity recognition. Proc. Human Language Tech. Conf. North Amer. Chapter Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 69–72.Crossref, Google Scholar
Korde V, Mahender CN (2012) Text classification and classifiers: A survey. Internat. J. Artificial Intelligence Appl. 3(2):85–99.Crossref, Google Scholar
Lafferty J, McCallum A, Pereira F (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proc. 18th Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 282–289.Google Scholar
Lampert CH, Peters J (2009) Active structured learning for high-speed object detection. Proc. 31st DAGM Sympos. Pattern Recognition (Springer-Verlag, Berlin), 221–231.Crossref, Google Scholar
Lau KW, Wu QH (2003) Online training of support vector classifier. Pattern Recognition 36(8):1913–1920.Crossref, Google Scholar
Lau RYK, Bruza PD, Song D (2008) Towards a belief revision based adaptive and context sensitive information retrieval system. ACM Trans. Inform. Systems 26(2):8.1–8.38.Crossref, Google Scholar
Li Y, Bontcheva K, Cunningham H (2004) SVM based learning system for information extraction. Proc. First Internat. Conf. Deterministic Statist. Methods in Machine Learn. (Springer-Verlag, Berlin), 319–339.Google Scholar
Lopresti T (2009) An economic lifeline: Text mining customer experience. Accessed March 11, 2010, http://www.mycustomer.com/item/134245.Google Scholar
Lou X, Hamprecht FA (2012) Structured learning from partial annotations. Proc. 29th Internat. Conf. Machine Learn. (Omnipress, New York), 1519–1526.Google Scholar
Manning C, Schutze H (1999) Foundations of Statistical Natural Language Processing (MIT Press, Cambridge, MA).Google Scholar
Martens D, Baesens B, Gestel V, Vanthienen J (2007) Comprehensible credit scoring models using rule extraction from support vector machines. Eur. J. Oper. Res. 183(3):1466–1476.Crossref, Google Scholar
Maynard D, Peters W, Li Y (2006) Metrics for evaluation of ontology-based information extraction. Proc. Fifteenth International Conf. World Wide Web (ACM, New York), 233–240.Google Scholar
McCallum A (2002) MALLET: A machine learning for language toolkit. Accessed January 3, 2011, http://mallet.cs.umass.edu.Google Scholar
Mitchell TM (1982) Generalization as search. Artificial Intelligence 18(2):203–226.Crossref, Google Scholar
Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. Proc. Twenty-First Internat. Conf. Machine Learn. (ACM, New York), 79–86.Crossref, Google Scholar
Nguyen N, Guo Y (2007) Comparisons of sequence labeling algorithms and extensions. Proc. Twenty-Fourth Internat. Conf. Machine Learn. (ACM, New York), 681–688.Crossref, Google Scholar
Ohta T, Tateisi Y, Kim J, Mima H, Tsujii J (2002) The GENIA corpus: An annotated research abstract corpus in molecular biology domain. Proc. Second Internat. Conf. Human Language Tech. Res. (Morgan Kaufmann, San Francisco), 82–86.Crossref, Google Scholar
Olafsson S, Li X, Wu S (2008) Operations research and data mining. Eur. J. Oper. Res. 187(3):1429–1448.Crossref, Google Scholar
O'Riain S, Spyns P (2006) Enhancing the business analysis function with semantics. Proc. 2006 Confederated Internat. Conf. Move to Meaningful Internet Systems: CoopIS, DOA, GADA, ODBASE (Springer-Verlag, Berlin), 818–835.Crossref, Google Scholar
Padmanabhan B, Tuzhilin A (2003) On the use of optimization for data mining: Theoretical interactions and eCRM opportunities. Management Sci. 49(10):1327–1343.Link, Google Scholar
Pradhan S, Hacioglu K, Krugler V, Ward W, Martin JH, Jurafsky D (2005) Support vector learning for semantic argument classification. Machine Learn. 60:11–39.Crossref, Google Scholar
Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. IEEE 77(2):257–286.Crossref, Google Scholar
Reeve L (2004) Integrating hidden Markov models into semantic Web annotation platforms. Technical report, College of Information Science and Technology, Drexel University, Philadelphia, PA.Google Scholar
Reeve L, Han H (2005) Survey of semantic annotation platforms. Proc. 20th Annual ACM Sympos. Appl. Comput. (ACM, New York), 1634–1638.Crossref, Google Scholar
Ring S (2001) Incremental learning with support vector machines. Proc. First IEEE Internat. Conf. Data Mining (IEEE Computer Society, Washington, DC), 641–642.Google Scholar
Rita K, Eid T, White A (2005) Management update: Companies should align their structured and unstructured data. Accessed March 21, 2010, http://www.gartner.com/DisplayDocument?doc_cd=126099&ref=g_fromdoc.Google Scholar
Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. Brodley CE, Danyluk A, eds. Proc. Eighteenth Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 441–448.Google Scholar
Saar-Tsechansky M, Melville P, Provost F (2009) Active feature-value acquisition. Management Sci. 55(4):664–684.Link, Google Scholar
Scheffer T, Decomain C, Wrobel S (2001) Active hidden Markov models for information extraction. Proc. 4th Internat. Conf. Adv. Intelligent Data Anal. (Springer-Verlag, London), 309–318.Crossref, Google Scholar
Sen S, Padmanabhan B, Tuzhilin A, White NH, Stein R (1998) The identification and satisfaction of consumer analysis-driven information needs of marketers on the WWW. Eur. J. Marketing 32(7–8):688–702.Crossref, Google Scholar
Settles B (2009) Active learning literature survey. Technical Report 1648, Computer Sciences Department, University of Wisconsin–Madison, Madison.Google Scholar
Settles B (2011) From theories to queries: Active learning in practice. JMLR Workshop Conf. Proc. (Microtome Publishing, Brookline, MA), 1–18.Google Scholar
Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. Proc. 2008 Conf. Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Stroudsburg, PA), 1070–1079.Crossref, Google Scholar
Shawe-Taylor J, Cristianini N (1999) Further results on the margin distribution. Proc. Twelfth Annual Conf. Comput. Learn. Theory (ACM, New York), 278–285.Crossref, Google Scholar
Shen D, Zhang J, Su J, Zhou G, Tan C (2004) Multi-criteria-based active learning for named entity recognition. Proc. 42nd Annual Meeting Assoc. Comput. Linguistics (Association for Computational Linguistics, Stroudsburg, PA), 589.Crossref, Google Scholar
Spangler S, Kreulen JT, Lessler J (2003) Generating and browsing multiple taxonomies over a document collection. J. Management Inform. Systems 19(4):191–212.Crossref, Google Scholar
Srivastava J, Cooley R (2003) Web business intelligence: Mining the Web for actionable knowledge. INFORMS J. Comput. 15(2):191–207.Link, Google Scholar
Strehl A, Ghosh J (2003) Relationship-based clustering and visualization for high-dimensional data mining. INFORMS J. Comput. 15(2):208–230.Link, Google Scholar
Surdeanu M, Ciaramita M (2007) Robust information extraction with perceptrons. Proc. NIST 2007 Automatic Content Extraction Workshop (NIST Multimodal Information Group, Washington, DC), 1–4.Google Scholar
Sutton C, McCallum A (2007) Piecewise pseudolikelihood for efficient training of conditional random fields. Proc. 24th Internat. Conf. Machine Learn. (ACM, New York), 863–870.Crossref, Google Scholar
Symons CT, Samatova NF, Ramya K, Park BH (2006) Multi-criterion active learning in conditional random fields. Proc. 18th IEEE Internat. Conf. Tools with Artificial Intelligence (IEEE Computer Society, Washington, DC), 323–331.Crossref, Google Scholar
Tang J, Hong M, Li J, Liang B (2006) Tree-structured conditional random fields for semantic annotation. Proc. Fifth Internat. Semantic Web Conf. (Springer-Verlag, Berlin), 640–653.Crossref, Google Scholar
Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. Proc. Sixteenth Internat. Conf. Machine Learn. (Morgan Kaufmann, San Francisco), 406–414.Google Scholar
Tong S, Koller D (2002) Support vector machine active learning with applications to text classification. J. Machine Learn. Res. 2:45–66.Google Scholar
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. Proc. Twenty-First Internat. Conf. Machine Learn. (ACM, New York), 104–111.Crossref, Google Scholar
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. J. Machine Learn. Res. 6:1453–1484.Google Scholar
Victoria U, Cimiano P, Iria J, Handschuh S, Vargas-Vera M, Motta E, Ciravegna F (2006) Semantic annotation for knowledge management: Requirements and a survey of the state of the art. J. Web Semantics 4(1):14–28.Crossref, Google Scholar
Vladimir V (1995) The Nature of Statistical Learning Theory (Springer-Verlag, New York).Google Scholar
Webb AR (2002) Statistical Pattern Recognition (John Wiley & Sons, Hoboken, NJ).Crossref, Google Scholar
Wei CP, Hu PJ, Tai CH, Huang CN, Yang CS (2008) Managing word mismatch problems in information retrieval: A topic-based query expansion approach. J. Management Inform. Systems 24(3):269–295.Crossref, Google Scholar
Yan R, Yang J, Hauptmann A (2003) Automatically labeling video data using multi-class active learning. Proc. Ninth IEEE Internat. Conf. Comput. Vision (IEEE Computer Society, Washington, DC), 516–523.Crossref, Google Scholar
Yasemin A, Ioannis T, Thomas H (2003) Hidden Markov support vector machines. Proc. Twentieth Internat. Conf. Machine Learn. (AAAI Press, Palo Alto, CA), 3–10.Google Scholar
Yu CN (2010) Improved learning of structural support vector machines: Training with latent variables and nonlinear kernels. Unpublished doctoral dissertation, Department of Computer Science, Cornell University, Ithaca, NY.Google Scholar
Zhang X, Zou J, Le DX, Thoma GR (2010) A structural SVM approach for reference parsing. Proc. Ninth Internat. Conf. Machine Learn. Appl. (IEEE Computer Society, Washington, DC), 479–484.Crossref, Google Scholar
Zhao B, Yin X, Xing EP (2011) Max margin learning on domain-independent Web information extraction. Proc. 20th ACM Internat. Conf. Inform. Knowledge Management (ACM, New York), 1305–1310.Crossref, Google Scholar
Zheng Z, Padmanabhan B (2006) Selectively acquiring customer information: A new data acquisition problem and an active learning-based solution. Management Sci. 52(5):697–712.Link, Google Scholar
Zhu J, Nie Z, Zhang B, Wen J-R (2008) Dynamic hierarchical markov random fields for integrated Web data extraction. J. Machine Learn. Res. 9:1583–1614.Google Scholar

cover image INFORMS Journal on Computing

Volume 26, Issue 3

Summer 2014

Pages 415-643

Article Information

Supplemental Material

Metrics

Information

Received:December 01, 2011
Accepted:June 01, 2013
Published Online:February 28, 2014

Cite as

Kaiquan Xu, Stephen Shaoyi Liao, Raymond Y. K. Lau, J. Leon Zhao (2014) Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective. INFORMS Journal on Computing 26(3):461-483.

https://doi.org/10.1287/ijoc.2013.0578

Keywords

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Effective Active Learning Strategies for the Use of Large-Margin Classifiers in Semantic Annotation: An Optimal Parameter Discovery Perspective

References

Volume 26, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News