Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data

Yongqin Qiu
Yongqin Qiu
[email protected]
https://orcid.org/0009-0006-3725-2289
International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China
Search for more papers by this author
,
Yuanxing Chen
Yuanxing Chen
[email protected]
https://orcid.org/0009-0002-1547-1721
Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China
Search for more papers by this author
,
Kan Fang
Corresponding Author
Kan Fang
[email protected]
https://orcid.org/0000-0002-0847-6906
College of Management and Economics, Tianjin University, Tianjin 300072, China; and Laboratory of Computation and Analytics of Complex Management Systems (CACMS), Tianjin University, Tianjin 300072, China
Search for more papers by this author
,
Lean Yu
Corresponding Author
Lean Yu
[email protected]
https://orcid.org/0000-0002-8035-4938
Business School, Sichuan University, Chengdu 610065, China
Search for more papers by this author
,
Kuangnan Fang
Corresponding Author
Kuangnan Fang
[email protected]
https://orcid.org/0000-0003-0934-7281
School of Economics, Xiamen University, Xiamen 316005, China
Search for more papers by this author

International Institute of Finance, School of Management, University of Science and Technology of China, Hefei 230026, China

Search for more papers by this author

Yuanxing Chen

[email protected]

https://orcid.org/0009-0002-1547-1721

Yau Mathematical Sciences Center, Tsinghua University, Beijing 100084, China

Search for more papers by this author

Kan Fang

Corresponding Author

Kan Fang

[email protected]

https://orcid.org/0000-0002-0847-6906

College of Management and Economics, Tianjin University, Tianjin 300072, China; and Laboratory of Computation and Analytics of Complex Management Systems (CACMS), Tianjin University, Tianjin 300072, China

Search for more papers by this author

Lean Yu

Corresponding Author

Lean Yu

[email protected]

https://orcid.org/0000-0002-8035-4938

Business School, Sichuan University, Chengdu 610065, China

Search for more papers by this author

Kuangnan Fang

Corresponding Author

Kuangnan Fang

[email protected]

https://orcid.org/0000-0003-0934-7281

School of Economics, Xiamen University, Xiamen 316005, China

Search for more papers by this author

Published Online:27 Sep 2024https://doi.org/10.1287/ijoc.2023.0366

References

Awoyemi JO, Adetunmbi AO, Oluwadare SA (2017) Credit card fraud detection using machine learning techniques: A comparative analysis. 2017 Internat. Conf. Comput. Networking Informatics (ICCNI) (IEEE, Piscataway, NJ), 1–9.Google Scholar
Bao Y, Hilary G, Ke B (2022) Artificial intelligence and fraud detection. Babich V, Birge JR, Hilary G, eds. Innovative Technology at the Interface of Finance and Operations, Springer Series in Supply Chain Management, vol. 11 (Springer, Cham, Switzerland), 223–247.Crossref, Google Scholar
Bekker J, Davis J (2020) Learning from positive and unlabeled data: A survey. Machine Learn. 109(4):719–760.Crossref, Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations Trends Machine Learn. 3(1):1–122.Crossref, Google Scholar
Breheny P, Huang J (2009) Penalized methods for bi-level variable selection. Statist. Its Interface 2(3):369–380.Crossref, Google Scholar
Cai T, Liu M, Xia Y (2022) Individual data protected integrative regression analysis of high-dimensional heterogeneous data. J. Amer. Statist. Assoc. 117(540):2105–2119.Crossref, Google Scholar
Cao H, Zhou J, Schwarz E (2019) RMTL: An R library for multi-task learning. Bioinformatics 35(10):1797–1798.Crossref, Google Scholar
Chen J, Tran-Dinh Q, Kosorok MR, Liu Y (2021) Identifying heterogeneous effect using latent supervised clustering with adaptive fusion. J. Comput. Graphical Statist. 30(1):43–54.Crossref, Google Scholar
Chen S, Qiu Y, Li J, Fang K, Fang K (2023) Precision marketing for financial industry using a PU-learning recommendation method. J. Bus. Res. 160:113771.Crossref, Google Scholar
Claesen M, De Smet F, Suykens JA, De Moor B (2015) A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing 160:73–84.Crossref, Google Scholar
Djeundje VB, Crook J, Calabrese R, Hamid M (2021) Enhancing credit scoring with alternative data. Expert Systems Appl. 163:113766.Crossref, Google Scholar
Duan Y, Wang K (2023) Adaptive and robust multi-task learning. Ann. Statist. 51(5):2015–2039.Crossref, Google Scholar
Dumitrescu E, Hué S, Hurlin C, Tokpavi S (2022) Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. Eur. J. Oper. Res. 297(3):1178–1192.Crossref, Google Scholar
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. Proc. 14th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 213–220.Google Scholar
Fan X, Liu M, Fang K, Huang Y, Ma S (2017) Promoting structural effects of covariates in the cure rate model with penalization. Statist. Methods Medical Res. 26(5):2078–2092.Crossref, Google Scholar
Fang W, Chen C, Song B, Wang L, Zhou J, Zhu KQ (2019) Adapted tree boosting for transfer learning. 2019 IEEE Internat. Conf. Big Data (Big Data) (IEEE, Piscataway, NJ), 741–750.Google Scholar
He K, Wang Y, Xie X, Shao D (2024) A multi-task positive-unlabeled learning framework to predict secreted proteins in human body fluids. Complex Intelligent Systems 10(1):1319–1331.Crossref, Google Scholar
He Y, Zhou L, Xia Y, Lin H (2023b) Center-augmented ℓ2-type regularization for subgroup learning. Biometrics 79(3):2157–2170.Crossref, Google Scholar
He H, Wang Z, Jain H, Jiang C, Yang S (2023a) A privacy-preserving decentralized credit scoring method based on multi-party information. Decision Support Systems 166:113910.Crossref, Google Scholar
Höppner S, Baesens B, Verbeke W, Verdonck T (2022) Instance-dependent cost-sensitive learning for detecting transfer fraud. Eur. J. Oper. Res. 297(1):291–300.Crossref, Google Scholar
Huang Y, Zhang Q, Zhang S, Huang J, Ma S (2017) Promoting similarity of sparsity structures in integrative analysis with penalization. J. Amer. Statist. Assoc. 112(517):342–350.Crossref, Google Scholar
Jacob L Vert J-P, Bach F (2008) Clustered multi-task learning: A convex formulation. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Proc. 21st Internat. Conf. Neural Inform. Processing Systems, vol. 21 (Curran Associates Inc., Red Hook, NY), 745–752.Google Scholar
Jain S, White M, Radivojac P (2017) Recovering true classifier performance in positive-unlabeled learning. Proc. AAAI Conf. Artificial Intelligence, vol. 31 (Association for the Advancement of Artificial Intelligence, Palo Alto, CA).Google Scholar
Jowkar GH, Mansoori EG (2016) Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. Comput. Biol. Chemistry 64:263–270.Crossref, Google Scholar
Kolosov N, Daly MJ, Artomov M (2021) Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning. Eur. J. Human Genetics 29(10):1527–1535.Crossref, Google Scholar
Langevin A, Cody T, Adams S, Beling P (2022) Generative adversarial networks for data augmentation and transfer in credit card fraud detection. J. Oper. Res. Soc. 73(1):153–180.Crossref, Google Scholar
Lebichot B, Le Borgne YA, He-Guelton L, Oblé F, Bontempi G (2020) Deep-learning domain adaptation techniques for credit cards fraud detection. Oneto L, Navarin N, Sperduti A, Anguita D, eds. Recent Adv. Big Data Deep Learn. Proc. INNS Big Data Deep Learn. Conf. INNSBDDL2019 (Springer International Publishing, Cham), 78–88.Google Scholar
Lebichot B, Verhelst T, Le Borgne YA, He-Guelton L, Oble F, Bontempi G (2021) Transfer learning strategies for credit card fraud detection. IEEE Access 9:114754–114766.Crossref, Google Scholar
Liu J, Ma S, Huang J (2014) Integrative analysis of cancer diagnosis studies with composite penalization. Scand. J. Statist. 41(1):87–103.Crossref, Google Scholar
Liu B, Che Z, Zhong H, Xiao Y (2023a) A ranking based multi-view method for positive and unlabeled graph classification. IEEE Trans. Knowledge Data Engrg. 35(3):2220–2230.Google Scholar
Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. Third IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 179–186.Google Scholar
Liu B, Peng T, Xiao Y, Zhao S, Sun P, Li X, Zheng Z, Huang Y (2023b) Self-paced multi-view positive and unlabeled graph learning with auxiliary information. Inform. Sci. 642:119146.Crossref, Google Scholar
Lu F, Bai Q (2010) Semi-supervised text categorization with only a few positive and unlabeled documents. 2010 3rd Internat. Conf. Biomedical Engrg. Informatics, vol. 7 (IEEE, Piscataway, NJ), 3075–3079.Google Scholar
Ma S, Huang J (2017) A concave pairwise fusion approach to subgroup analysis. J. Amer. Statist. Assoc. 112(517):410–423.Crossref, Google Scholar
Ortega Vázquez C, vanden Broucke S, De Weerdt J (2024) Hellinger distance decision trees for PU learning in imbalanced data sets. Machine Learn. 113(7):4547–4578.Crossref, Google Scholar
Qin X, Zhang Y, Li C, Li X (2013) Learning from data streams with only positive and unlabeled data. J. Intelligent Inform. Systems 40(3):405–430.Crossref, Google Scholar
Qiu Y, Chen Y, Fang K, Yu L, Fang K (2024) Fraud detection by integrating multisource heterogeneous presence-only data. http://dx.doi.org/10.1287/ijoc.2023.0366.cd, https://github.com/INFORMSJoC/2023.0366.Google Scholar
Rtayli N, Enneya N (2020) Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. J. Inform. Security Appl. 55:102596.Google Scholar
Sahin Y, Duman E (2011) Detecting credit card fraud by ANN and logistic regression. 2011 Internat. Sympos. Innovations Intelligent Systems Appl. (IEEE, Piscataway, NJ), 315–319.Google Scholar
Seera M, Lim CP, Kumar A, Dhamotharan L, Tan KH (2024) An intelligent payment card fraud detection system. Ann. Oper. Res. 334(1):445–467.Crossref, Google Scholar
Somasundaram A, Reddy S (2019) Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance. Neural Comput. Appl. 31(Suppl. 1):3–14.Crossref, Google Scholar
Song H, Raskutti G (2019) PUlasso: High-dimensional variable selection with presence-only data. J. Amer. Statist. Assoc. 115(529):334–347.Crossref, Google Scholar
Tang L, Song PX (2016) Fused lasso approach in regression coefficients clustering: Learning parameter heterogeneity in data integration. J. Machine Learn. Res. 17(1):3915–3937.Google Scholar
Tang X, Xue F, Qu A (2021) Individualized multidirectional variable selection. J. Amer. Statist. Assoc. 116(535):1280–1296.Crossref, Google Scholar
Vinay MS, Yuan S, Wu X (2022) Fraud detection via contrastive positive unlabeled learning. 2022 IEEE Internat. Conf. Big Data (Big Data) (IEEE, Piscataway, NJ), 1475–1484.Google Scholar
Wang L, Jia F, Chen L, Xu Q (2023) Forecasting SMEs’ credit risk in supply chain finance with a sampling strategy based on machine learning techniques. Ann. Oper. Res. 331(1):1–33.Crossref, Google Scholar
Wang S, Shi X, Wu M, Ma S (2019) Horizontal and vertical integrative analysis methods for mental disorders omics data. Sci. Rep. 9(1):1–12.Google Scholar
Ward G, Hastie T, Barry S, Elith J, Leathwick JR (2009) Presence-only data and the EM algorithm. Biometrics 65(2):554–563.Crossref, Google Scholar
Xiao J, Tian Y, Jia Y, Jiang X, Yu L, Wang S (2023) Black-box attack-based security evaluation framework for credit card fraud detection models. INFORMS J. Comput. 35(5):986–1001.Link, Google Scholar
Xu Y, Xu C, Xu C, Tao D (2017) Multi-positive and unlabeled learning. Proc. 26th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3182–3188.Google Scholar
Yang F, Abedin MZ, Hajek P (2024) An explainable federated learning and blockchain-based secure credit modeling method. Eur. J. Oper. Res. 317(2):449–467.Crossref, Google Scholar
Yang X, Yan X, Huang J (2019) High-dimensional integrative analysis with homogeneity and sparsity recovery. J. Multivariate Anal. 174:104529.Crossref, Google Scholar
Yang P, Li X, Chua HN, Kwoh CK, Ng SK (2014) Ensemble positive unlabeled learning for disease gene identification. PLoS One 9(5):e97079.Crossref, Google Scholar
Yu S, Li C (2007) PE-PUC: A graph based PU-learning approach for text classification. Perner P, ed. Machine Learn. Data Mining Pattern Recognition 5th Internat. Conf. MLDM 2007 (Springer Berlin Heidelberg, Berlin, Heidelberg), 574–584.Google Scholar
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38(2):894–942.Crossref, Google Scholar
Zhang Y, Yang Q (2021) A survey on multi-task learning. IEEE Trans. Knowledge Data Engrg. 34(12):5586–5609.Crossref, Google Scholar
Zhao H, Zhao C, Zhang X, Liu N, Zhu H, Liu Q, Xiong H (2023) An ensemble learning approach with gradient resampling for class-imbalance problems. INFORMS J. Comput. 35(4):747–763.Link, Google Scholar
Zheng L, Liu G, Yan C, Jiang C, Zhou M, Li M (2020) Improved TrAdaBoost and its application to transaction fraud detection. IEEE Trans. Comput. Soc. Systems 7(5):1304–1316.Crossref, Google Scholar
Zhou JT, Pan SJ, Mao Q, Tsang IW (2012) Multi-view positive and unlabeled learning. Hoi SCH, Buntine W, eds. Asian Conf. Machine Learn. (PMLR, New York), 555–570.Google Scholar

cover image INFORMS Journal on Computing

Volume 37, Issue 4

July-August 2025

Pages iv-viii, 785-1141, iii

Article Information

Supplemental Material

Metrics

Information

Received:October 13, 2023
Accepted:September 07, 2024
Published Online:September 27, 2024

Cite as

Yongqin Qiu, Yuanxing Chen, Kan Fang, Lean Yu, Kuangnan Fang (2024) Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data. INFORMS Journal on Computing 37(4):998-1017.

https://doi.org/10.1287/ijoc.2023.0366

Keywords

Acknowledgments

The authors express their sincere gratitude to the area editor, an associate editor, and two referees for their constructive comments.

PDF download

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Available Issues

Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data

References

Volume 37, Issue 4

Article Information

Supplemental Material

Metrics

Information

Cite as

Keywords

Sign Up for INFORMS Publications Updates and News