The study introduces the Customer-level Fraud Detection Benchmark (CFDB), a structured dataset designed to overcome the limitations of traditional transaction-level fraud detection datasets. The CFDB aggregates customer-level data from three datasets - SAML-D, AML-World-HI-Small, and AML-World-LI-Small - to provide a more comprehensive view of customer behavior patterns and facilitate the development of sophisticated machine learning models for fraud detection.
The key highlights of the CFDB include:
The study evaluates the performance of several baseline machine learning models, including Linear Regression, Decision Tree, XGBoost, and Neural Network, on the CFDB. The results reveal the strengths and limitations of each model, highlighting the importance of using a multifaceted approach to accurately assess model performance, especially in the context of imbalanced datasets typical of fraud detection tasks.
The findings emphasize the need for ongoing research and development to refine machine learning techniques, address data imbalances, and explore hybrid approaches that leverage the complementary strengths of various models. By contributing the CFDB as a valuable resource, the study aims to set a new standard in fraud detection research and empower the development of next-generation fraud detection systems.
לשפה אחרת
מתוכן המקור
arxiv.org
תובנות מפתח מזוקקות מ:
by Phoebe Jing,... ב- arxiv.org 04-24-2024
https://arxiv.org/pdf/2404.14746.pdfשאלות מעמיקות