This research paper investigates the effectiveness of different machine learning algorithms, namely CatBoost, XGBoost, and Explainable Boosting Machine (EBM), in detecting phishing websites. The study emphasizes the crucial role of feature selection and model interpretability in improving detection accuracy and efficiency.
Research Objective:
The study aims to determine the most effective feature selection methods and machine learning algorithms for accurately and efficiently detecting phishing websites. It also explores the use of Explainable AI (XAI) techniques to understand the influence of different features on model predictions.
Methodology:
The researchers collected datasets from various sources, including UCI Phishing Websites, Kaggle, and Mendeley Data. They employed Recursive Feature Elimination (RFE) to identify the most relevant features for phishing detection. The selected features were then used to train and evaluate the performance of CatBoost, XGBoost, and EBM models. The models were assessed based on accuracy, precision, recall, and processing time. Additionally, SHAP (SHapley Additive exPlanations) analysis was employed to understand feature importance and model interpretability.
Key Findings:
Main Conclusions:
Significance:
This research contributes to the development of more robust, efficient, and interpretable phishing detection systems. The findings have practical implications for cybersecurity professionals in building effective defenses against phishing attacks.
Limitations and Future Research:
The study primarily focused on URL-based features. Future research could explore the inclusion of content-based and external-based features to enhance detection accuracy. Additionally, investigating the effectiveness of hybrid models combining multiple algorithms could further improve performance.
To Another Language
from source content
arxiv.org
Key Insights Distilled From
by Abdullah Faj... at arxiv.org 11-12-2024
https://arxiv.org/pdf/2411.06860.pdfDeeper Inquiries