Fraud Detection by Integrating Multisource Heterogeneous Presence-Only Data

Published Online:https://doi.org/10.1287/ijoc.2023.0366

References

  • Awoyemi JO, Adetunmbi AO, Oluwadare SA (2017) Credit card fraud detection using machine learning techniques: A comparative analysis. 2017 Internat. Conf. Comput. Networking Informatics (ICCNI) (IEEE, Piscataway, NJ), 1–9.Google Scholar
  • Bao Y, Hilary G, Ke B (2022) Artificial intelligence and fraud detection. Babich V, Birge JR, Hilary G, eds. Innovative Technology at the Interface of Finance and Operations, Springer Series in Supply Chain Management, vol. 11 (Springer, Cham, Switzerland), 223–247.CrossrefGoogle Scholar
  • Bekker J, Davis J (2020) Learning from positive and unlabeled data: A survey. Machine Learn. 109(4):719–760.CrossrefGoogle Scholar
  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations Trends Machine Learn. 3(1):1–122.CrossrefGoogle Scholar
  • Breheny P, Huang J (2009) Penalized methods for bi-level variable selection. Statist. Its Interface 2(3):369–380.CrossrefGoogle Scholar
  • Cai T, Liu M, Xia Y (2022) Individual data protected integrative regression analysis of high-dimensional heterogeneous data. J. Amer. Statist. Assoc. 117(540):2105–2119.CrossrefGoogle Scholar
  • Cao H, Zhou J, Schwarz E (2019) RMTL: An R library for multi-task learning. Bioinformatics 35(10):1797–1798.CrossrefGoogle Scholar
  • Chen J, Tran-Dinh Q, Kosorok MR, Liu Y (2021) Identifying heterogeneous effect using latent supervised clustering with adaptive fusion. J. Comput. Graphical Statist. 30(1):43–54.CrossrefGoogle Scholar
  • Chen S, Qiu Y, Li J, Fang K, Fang K (2023) Precision marketing for financial industry using a PU-learning recommendation method. J. Bus. Res. 160:113771.CrossrefGoogle Scholar
  • Claesen M, De Smet F, Suykens JA, De Moor B (2015) A robust ensemble approach to learn from positive and unlabeled data using SVM base models. Neurocomputing 160:73–84.CrossrefGoogle Scholar
  • Djeundje VB, Crook J, Calabrese R, Hamid M (2021) Enhancing credit scoring with alternative data. Expert Systems Appl. 163:113766.CrossrefGoogle Scholar
  • Duan Y, Wang K (2023) Adaptive and robust multi-task learning. Ann. Statist. 51(5):2015–2039.CrossrefGoogle Scholar
  • Dumitrescu E, Hué S, Hurlin C, Tokpavi S (2022) Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects. Eur. J. Oper. Res. 297(3):1178–1192.CrossrefGoogle Scholar
  • Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. Proc. 14th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (Association for Computing Machinery, New York), 213–220.Google Scholar
  • Fan X, Liu M, Fang K, Huang Y, Ma S (2017) Promoting structural effects of covariates in the cure rate model with penalization. Statist. Methods Medical Res. 26(5):2078–2092.CrossrefGoogle Scholar
  • Fang W, Chen C, Song B, Wang L, Zhou J, Zhu KQ (2019) Adapted tree boosting for transfer learning. 2019 IEEE Internat. Conf. Big Data (Big Data) (IEEE, Piscataway, NJ), 741–750.Google Scholar
  • He K, Wang Y, Xie X, Shao D (2024) A multi-task positive-unlabeled learning framework to predict secreted proteins in human body fluids. Complex Intelligent Systems 10(1):1319–1331.CrossrefGoogle Scholar
  • He Y, Zhou L, Xia Y, Lin H (2023b) Center-augmented ℓ2-type regularization for subgroup learning. Biometrics 79(3):2157–2170.CrossrefGoogle Scholar
  • He H, Wang Z, Jain H, Jiang C, Yang S (2023a) A privacy-preserving decentralized credit scoring method based on multi-party information. Decision Support Systems 166:113910.CrossrefGoogle Scholar
  • Höppner S, Baesens B, Verbeke W, Verdonck T (2022) Instance-dependent cost-sensitive learning for detecting transfer fraud. Eur. J. Oper. Res. 297(1):291–300.CrossrefGoogle Scholar
  • Huang Y, Zhang Q, Zhang S, Huang J, Ma S (2017) Promoting similarity of sparsity structures in integrative analysis with penalization. J. Amer. Statist. Assoc. 112(517):342–350.CrossrefGoogle Scholar
  • Jacob L Vert J-P, Bach F (2008) Clustered multi-task learning: A convex formulation. Koller D, Schuurmans D, Bengio Y, Bottou L, eds. Proc. 21st Internat. Conf. Neural Inform. Processing Systems, vol. 21 (Curran Associates Inc., Red Hook, NY), 745–752.Google Scholar
  • Jain S, White M, Radivojac P (2017) Recovering true classifier performance in positive-unlabeled learning. Proc. AAAI Conf. Artificial Intelligence, vol. 31 (Association for the Advancement of Artificial Intelligence, Palo Alto, CA).Google Scholar
  • Jowkar GH, Mansoori EG (2016) Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. Comput. Biol. Chemistry 64:263–270.CrossrefGoogle Scholar
  • Kolosov N, Daly MJ, Artomov M (2021) Prioritization of disease genes from GWAS using ensemble-based positive-unlabeled learning. Eur. J. Human Genetics 29(10):1527–1535.CrossrefGoogle Scholar
  • Langevin A, Cody T, Adams S, Beling P (2022) Generative adversarial networks for data augmentation and transfer in credit card fraud detection. J. Oper. Res. Soc. 73(1):153–180.CrossrefGoogle Scholar
  • Lebichot B, Le Borgne YA, He-Guelton L, Oblé F, Bontempi G (2020) Deep-learning domain adaptation techniques for credit cards fraud detection. Oneto L, Navarin N, Sperduti A, Anguita D, eds. Recent Adv. Big Data Deep Learn. Proc. INNS Big Data Deep Learn. Conf. INNSBDDL2019 (Springer International Publishing, Cham), 78–88.Google Scholar
  • Lebichot B, Verhelst T, Le Borgne YA, He-Guelton L, Oble F, Bontempi G (2021) Transfer learning strategies for credit card fraud detection. IEEE Access 9:114754–114766.CrossrefGoogle Scholar
  • Liu J, Ma S, Huang J (2014) Integrative analysis of cancer diagnosis studies with composite penalization. Scand. J. Statist. 41(1):87–103.CrossrefGoogle Scholar
  • Liu B, Che Z, Zhong H, Xiao Y (2023a) A ranking based multi-view method for positive and unlabeled graph classification. IEEE Trans. Knowledge Data Engrg. 35(3):2220–2230.Google Scholar
  • Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. Third IEEE Internat. Conf. Data Mining (IEEE, Piscataway, NJ), 179–186.Google Scholar
  • Liu B, Peng T, Xiao Y, Zhao S, Sun P, Li X, Zheng Z, Huang Y (2023b) Self-paced multi-view positive and unlabeled graph learning with auxiliary information. Inform. Sci. 642:119146.CrossrefGoogle Scholar
  • Lu F, Bai Q (2010) Semi-supervised text categorization with only a few positive and unlabeled documents. 2010 3rd Internat. Conf. Biomedical Engrg. Informatics, vol. 7 (IEEE, Piscataway, NJ), 3075–3079.Google Scholar
  • Ma S, Huang J (2017) A concave pairwise fusion approach to subgroup analysis. J. Amer. Statist. Assoc. 112(517):410–423.CrossrefGoogle Scholar
  • Ortega Vázquez C, vanden Broucke S, De Weerdt J (2024) Hellinger distance decision trees for PU learning in imbalanced data sets. Machine Learn. 113(7):4547–4578.CrossrefGoogle Scholar
  • Qin X, Zhang Y, Li C, Li X (2013) Learning from data streams with only positive and unlabeled data. J. Intelligent Inform. Systems 40(3):405–430.CrossrefGoogle Scholar
  • Qiu Y, Chen Y, Fang K, Yu L, Fang K (2024) Fraud detection by integrating multisource heterogeneous presence-only data. http://dx.doi.org/10.1287/ijoc.2023.0366.cd, https://github.com/INFORMSJoC/2023.0366.Google Scholar
  • Rtayli N, Enneya N (2020) Enhanced credit card fraud detection based on SVM-recursive feature elimination and hyper-parameters optimization. J. Inform. Security Appl. 55:102596.Google Scholar
  • Sahin Y, Duman E (2011) Detecting credit card fraud by ANN and logistic regression. 2011 Internat. Sympos. Innovations Intelligent Systems Appl. (IEEE, Piscataway, NJ), 315–319.Google Scholar
  • Seera M, Lim CP, Kumar A, Dhamotharan L, Tan KH (2024) An intelligent payment card fraud detection system. Ann. Oper. Res. 334(1):445–467.CrossrefGoogle Scholar
  • Somasundaram A, Reddy S (2019) Parallel and incremental credit card fraud detection model to handle concept drift and data imbalance. Neural Comput. Appl. 31(Suppl. 1):3–14.CrossrefGoogle Scholar
  • Song H, Raskutti G (2019) PUlasso: High-dimensional variable selection with presence-only data. J. Amer. Statist. Assoc. 115(529):334–347.CrossrefGoogle Scholar
  • Tang L, Song PX (2016) Fused lasso approach in regression coefficients clustering: Learning parameter heterogeneity in data integration. J. Machine Learn. Res. 17(1):3915–3937.Google Scholar
  • Tang X, Xue F, Qu A (2021) Individualized multidirectional variable selection. J. Amer. Statist. Assoc. 116(535):1280–1296.CrossrefGoogle Scholar
  • Vinay MS, Yuan S, Wu X (2022) Fraud detection via contrastive positive unlabeled learning. 2022 IEEE Internat. Conf. Big Data (Big Data) (IEEE, Piscataway, NJ), 1475–1484.Google Scholar
  • Wang L, Jia F, Chen L, Xu Q (2023) Forecasting SMEs’ credit risk in supply chain finance with a sampling strategy based on machine learning techniques. Ann. Oper. Res. 331(1):1–33.CrossrefGoogle Scholar
  • Wang S, Shi X, Wu M, Ma S (2019) Horizontal and vertical integrative analysis methods for mental disorders omics data. Sci. Rep. 9(1):1–12.Google Scholar
  • Ward G, Hastie T, Barry S, Elith J, Leathwick JR (2009) Presence-only data and the EM algorithm. Biometrics 65(2):554–563.CrossrefGoogle Scholar
  • Xiao J, Tian Y, Jia Y, Jiang X, Yu L, Wang S (2023) Black-box attack-based security evaluation framework for credit card fraud detection models. INFORMS J. Comput. 35(5):986–1001.LinkGoogle Scholar
  • Xu Y, Xu C, Xu C, Tao D (2017) Multi-positive and unlabeled learning. Proc. 26th Internat. Joint Conf. Artificial Intelligence (AAAI Press, Palo Alto, CA), 3182–3188.Google Scholar
  • Yang F, Abedin MZ, Hajek P (2024) An explainable federated learning and blockchain-based secure credit modeling method. Eur. J. Oper. Res. 317(2):449–467.CrossrefGoogle Scholar
  • Yang X, Yan X, Huang J (2019) High-dimensional integrative analysis with homogeneity and sparsity recovery. J. Multivariate Anal. 174:104529.CrossrefGoogle Scholar
  • Yang P, Li X, Chua HN, Kwoh CK, Ng SK (2014) Ensemble positive unlabeled learning for disease gene identification. PLoS One 9(5):e97079.CrossrefGoogle Scholar
  • Yu S, Li C (2007) PE-PUC: A graph based PU-learning approach for text classification. Perner P, ed. Machine Learn. Data Mining Pattern Recognition 5th Internat. Conf. MLDM 2007 (Springer Berlin Heidelberg, Berlin, Heidelberg), 574–584.Google Scholar
  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38(2):894–942.CrossrefGoogle Scholar
  • Zhang Y, Yang Q (2021) A survey on multi-task learning. IEEE Trans. Knowledge Data Engrg. 34(12):5586–5609.CrossrefGoogle Scholar
  • Zhao H, Zhao C, Zhang X, Liu N, Zhu H, Liu Q, Xiong H (2023) An ensemble learning approach with gradient resampling for class-imbalance problems. INFORMS J. Comput. 35(4):747–763.LinkGoogle Scholar
  • Zheng L, Liu G, Yan C, Jiang C, Zhou M, Li M (2020) Improved TrAdaBoost and its application to transaction fraud detection. IEEE Trans. Comput. Soc. Systems 7(5):1304–1316.CrossrefGoogle Scholar
  • Zhou JT, Pan SJ, Mao Q, Tsang IW (2012) Multi-view positive and unlabeled learning. Hoi SCH, Buntine W, eds. Asian Conf. Machine Learn. (PMLR, New York), 555–570.Google Scholar
INFORMS site uses cookies to store information on your computer. Some are essential to make our site work; Others help us improve the user experience. By using this site, you consent to the placement of these cookies. Please read our Privacy Statement to learn more.