A Bayesian Semisupervised Approach to Keyword Extraction with Only Positive and Unlabeled Data

Guanshen Wang
Guanshen Wang
[email protected]
Department of Statistical Science, Southern Methodist University, Dallas, Texas 75205;
Search for more papers by this author
,
Yichen Cheng
Yichen Cheng
[email protected]
https://orcid.org/0000-0001-9881-0762
Institute for Insight, Georgia State University, Atlanta, Georgia 30303;
Search for more papers by this author
,
Yusen Xia
Yusen Xia
[email protected]
https://orcid.org/0000-0003-2360-5574
Institute for Insight, Georgia State University, Atlanta, Georgia 30303;
Search for more papers by this author
,
Qiang Ling
Qiang Ling
[email protected]
https://orcid.org/0000-0001-5688-4130
Department of Automation, University of Science and Technology of China, Hefei, Anhui 230026, China;
Search for more papers by this author
,
Xinlei Wang
Corresponding Author
Xinlei Wang
[email protected]
https://orcid.org/0000-0002-8561-6511
Department of Statistical Science, Southern Methodist University, Dallas, Texas 75205;Department of Mathematics, University of Texas at Arlington, Arlington, Texas 76019;Center for Data Science Research and Education, College of Science, University of Texas at Arlington, Arlington, Texas 76019
Search for more papers by this author

Guanshen Wang

[email protected]

Department of Statistical Science, Southern Methodist University, Dallas, Texas 75205;

Search for more papers by this author

Yichen Cheng

[email protected]

https://orcid.org/0000-0001-9881-0762

Institute for Insight, Georgia State University, Atlanta, Georgia 30303;

Search for more papers by this author

Yusen Xia

[email protected]

https://orcid.org/0000-0003-2360-5574

Institute for Insight, Georgia State University, Atlanta, Georgia 30303;

Search for more papers by this author

Qiang Ling

[email protected]

https://orcid.org/0000-0001-5688-4130

Department of Automation, University of Science and Technology of China, Hefei, Anhui 230026, China;

Search for more papers by this author

Xinlei Wang

Corresponding Author

Xinlei Wang

[email protected]

https://orcid.org/0000-0002-8561-6511

Department of Statistical Science, Southern Methodist University, Dallas, Texas 75205;Department of Mathematics, University of Texas at Arlington, Arlington, Texas 76019;Center for Data Science Research and Education, College of Science, University of Texas at Arlington, Arlington, Texas 76019

Search for more papers by this author

Published Online:28 Mar 2023https://doi.org/10.1287/ijoc.2023.1283

References

Andrieu C, Thoms J (2008) A tutorial on adaptive mcmc. Statist. Comput. 18(4):343–373.Crossref, Google Scholar
Boudin F (2016) pke: An open source python-based keyphrase extraction toolkit. Watanabe H, ed. Proc. 26th Internat. Conf. on Comput. Linguistics: System Demonstrations (The COLING 2016 Organizing Committee), 69–73.Google Scholar
Boudin F (2018) Unsupervised keyphrase extraction with multipartite graphs. Walker M, Ji H, Stent A, eds. Proc. Conf. of the North American Chapter of the Assoc. for Comput. Linguistics: Human Language Technologies (Association for Computational Linguistics, Cedarville, OH), vol. 2, 667–672.Google Scholar
Bougouin A, Boudin F, Daille B (2013) Topicrank: Graph-based topic ranking for keyphrase extraction. Mitkov R, Park JC, eds. Proc. Internat. Joint Conf. on Natural Language Processing (Asian Federation of Natural Language Processing), 543–551.Google Scholar
Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput. Networks ISDN Systems 30(1–7):107–117.Crossref, Google Scholar
Campos R, Mangaravite V, Pasquali A, Jorge AM, Nunes C, Jatowt A (2018) A text feature based automatic keyword extraction method for single documents. Pasi G, Piwowarskim B, Azzopardi L, Hanbury A, eds. Proc. Eur. Conf. on Inform. Retrieval (Springer, Berlin), 684–691.Crossref, Google Scholar
Caragea C, Bulgarov F, Godea A, Gollapalli SD (2014) Citation-enhanced keyphrase extraction from research papers: A supervised approach. Moschitti A, Pang B, Daelemans W, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Cedarville, OH), 1435–1446.Google Scholar
Chen J, Zhang X, Wu Y, Yan Z, Li Z (2018) Keyphrase generation with correlation constraints. Riloff E, Chiang D, Hockenmaier J, Tsujii J, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Cedarville, OH), 4057–4066.Google Scholar
El-Beltagy SR, Rafea A (2009) Kp-miner: A keyphrase extraction system for English and Arabic documents. Inform. Systems 34(1):132–144.Crossref, Google Scholar
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. Liu B, Sarawagi S, eds. Proc. 14th ACM SIGKDD Internat. Conf. on Knowledge Discovery and Data Mining (ACM, New York), 213–220.Google Scholar
Florescu C, Caragea C (2017) A position-biased pagerank algorithm for keyphrase extraction. Singh S, Markovitch S, eds. Proc. 31st AAAI Conf. on Artificial Intelligence (AAAI Press, Palo Alto, CA), 4923–4924.Crossref, Google Scholar
Frank E, Paynter GW, Witten IH, Gutwin C, Nevill-Manning CG (1999) Domain-specific keyphrase extraction. Herzog O, Schek H-J, Fuhr N, Chowdhury A, Teiken W, eds. Proc. 16th Internat. Joint Conf. on Artificial Intelligence (Morgan Kaufmann Publishers, San Francisco, CA), vol. 2, 668–673.Google Scholar
Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB (2013) Bayesian Data Analysis (CRC Press. Boca Raton, FL).Crossref, Google Scholar
Gollapalli SD, Caragea C (2014) Extracting keyphrases from research papers using citation networks. Proc. 28th AAAI Conf. on Artificial Intelligence (AAAI Press, Palo Alto, CA), 1629–1635.Crossref, Google Scholar
Haario H, Saksman E, Tamminen J (2005) Componentwise adaptation for high dimensional mcmc. Comput. Statist. 20(2):265–273.Crossref, Google Scholar
Hulth A (2003) Improved automatic keyword extraction given more linguistic knowledge. Collins M, Steedman M, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Cedarville, OH), 216–223.Google Scholar
Jonathan FC, Karnalim O (2018) Semi-supervised keyphrase extraction on scientific article using fact-based sentiment. Telkomnika 16(4):1771–1778.Crossref, Google Scholar
Jones KS (1972) A statistical interpretation of term specificity and its application in retrieval. J. Documents 28(5):111–121.Google Scholar
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J. Amer. Statist. Assoc. 53(282):457–481.Crossref, Google Scholar
Kim SN, Medelyan O, Kan MY, Baldwin T (2010) Semeval-2010 task 5: Automatic keyphrase extraction from scientific articles. Erk K, Strapparava C, eds. Proc. 5th Internat. Workshop on Semantic Evaluation (Association for Computational Linguistics, Cedarville, OH), 21–26.Google Scholar
Li D, Li S, Li W, Wang W, Qu W (2010) A semi-supervised key phrase extraction approach: learning from title phrases through a document semantic network. Proc. ACL Conf. Short Papers (Association for Computational Linguistics, Cedarville, OH), 296–300.Google Scholar
Liu Z, Huang W, Zheng Y, Sun M (2010) Automatic keyphrase extraction via topic decomposition. Li H, Màrquez L, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Cedarville, OH), 366–376.Google Scholar
Liu Z, Li P, Zheng Y, Sun M (2009) Clustering to find exemplar terms for keyphrase extraction. Koehn P, Mihalcea R, eds. Proc. EMNLP (Association for Computational Linguistics, Cedarville, OH), 257–266.Google Scholar
Mahata D, Kuriakose J, Shah R, Zimmermann R (2018) Key2vec: Automatic ranked keyphrase extraction from scientific articles using phrase embeddings. Walker M, Ji H, Stent A, eds. Proc. Conf. of the North Amer. Chapter of the Assoc. for Comput. Linguistics: Human Language Tech. (Association for Computational Linguistics, Cedarville, OH), vol. 2, 634–639.Google Scholar
McAuley JJ, Leskovec J (2013) From amateurs to connoisseurs: Modeling the evolution of user expertise through online reviews. Schwabe D, Almeida V, Glaser H, Baeza-Yates R, Moon S, eds. Proc. 22nd Internat. Conf. on World Wide Web (Association for Computing Machinery, New York), 897–908.Google Scholar
Meng R, Zhao S, Han S, He D, Brusilovsky P, Chi Y (2017) Deep keyphrase generation. Barzilay R, Kan M-Y, eds. Proc. 55th Annual Meeting of the Assoc. for Comput, Linguistics (Association for Computational Linguistics, Cedarville, OH), vol. 1, 582–592.Google Scholar
Mihalcea R, Tarau P (2004) Textrank: Bringing order into text. Lin D, Wu D, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Cedarville, OH), 404–411.Google Scholar
Newton MA, Noueiry A, Sarkar D, Ahlquist P (2004) Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5(2):155–176.Crossref, Google Scholar
Papagiannopoulou E, Tsoumakas G (2018) Local word vectors guiding keyphrase extraction. Inform. Processing Management 54(6):888–902.Crossref, Google Scholar
Papagiannopoulou E, Tsoumakas G (2020) A review of keyphrase extraction. Wiley Interdisciplinary Rev. Data Mining Knowledge Discovery 10(2):e1339.Crossref, Google Scholar
Porter MF (1980) An algorithm for suffix stripping. Programming 14(3):130–137.Google Scholar
Pudota N, Dattolo A, Baruzzo A, Tasso C (2010) A new domain independent keyphrase extraction system. Agosti M, Esposito F, Thanos C, eds. Digital Libraries (Springer, Berlin), 67–78.Crossref, Google Scholar
Roberts GO, Rosenthal JS (2007) Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Probability 44(2):458–475.Crossref, Google Scholar
Shi W, Zheng W, Yu JX, Cheng H, Zou L (2017) Keyphrase extraction using knowledge graphs. Data Sci. Engrg. 2(4):275–288.Crossref, Google Scholar
Sterckx L, Demeester T, Deleu J, Develder C (2015a) Topical word importance for fast keyphrase extraction. Gangemi A, Leonardi S, Panconesi A, eds. Proc. 24th Internat. Conf. on World Wide Web (Association for Computing Machinery, New York), 121–122.Google Scholar
Sterckx L, Demeester T, Deleu J, Develder C (2015b) When topic models disagree: Keyphrase extraction with multiple topic models. Gangemi A, Leonardi S, Panconesi A, eds. Proc. 24th Internat. Conf. on World Wide Web (Association for Computing Machinery, New York), 123–124.Google Scholar
Teneva N, Cheng W (2017) Salience rank: Efficient keyphrase extraction with topic modeling. Barzilay R, Kan M-Y, eds. Proc. 55th Annual Meeting of the Assoc. for Comput. Linguistics (Association for Computational Linguistics, Cedarville, OH), vol. 2, 530–535.Google Scholar
Turney PD (2000) Learning algorithms for keyphrase extraction. Inform. Retrieval 2(4):303–336.Crossref, Google Scholar
Wan X, Xiao J (2008) Single Document Keyphrase Extraction Using Neighborhood Knowledge, vol. 8 (AAAI Press, Palo Alto, CA), 855–860.Google Scholar
Wang R, Liu W, McDonald C (2014) Corpus-independent generic keyphrase extraction using word embedding vectors. Proc. Software Engrg. Res. Conf. (IEEE Computer Society, Washington, DC), vol. 39, 1–8.Google Scholar
Ye H, Wang L (2018) Semi-supervised learning for neural keyphrase generation. Riloff E, Chiang D, Hockenmaier J, Tsujii J, eds. Proc. Conf. on Empirical Methods in Natural Language Processing (Association for Computational Linguistics, Cedarville, OH), 4142–4153.Google Scholar
Yih Wt, Goodman J, Carvalho VR (2006) Finding advertising keywords on web pages. Carr L, De Roure D, Iyengar A, Goble C, Dahlin M, eds. Proc. 15th Internat. Conf. on World Wide Web (Association for Computing Machinery, New York), 213–222.Google Scholar
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2003) Learning with local and global consistency. Thrun S, Saul LK, Schölkopf B, eds. Proc. 16th Internat. Conf. on Neural Inform. Processing Systems (MIT Press, Cambridge, MA), 321–328.Google Scholar

cover image INFORMS Journal on Computing

Volume 35, Issue 3

May-June 2023

Pages 519-709, C2

Article Information

Supplemental Material

Metrics

Information

Received:September 03, 2021
Accepted:January 23, 2023
Published Online:March 28, 2023

Cite as

Guanshen Wang, Yichen Cheng, Yusen Xia, Qiang Ling, Xinlei Wang (2023) A Bayesian Semisupervised Approach to Keyword Extraction with Only Positive and Unlabeled Data. INFORMS Journal on Computing 35(3):675-691.

https://doi.org/10.1287/ijoc.2023.1283

Keywords

Acknowledgments

Computational time was generously provided by Southern Methodist University’s Center for Research Computing.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

A Bayesian Semisupervised Approach to Keyword Extraction with Only Positive and Unlabeled Data

References

Volume 35, Issue 3

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News