Основные понятия
To address the class imbalance problem in the Bot-IoT dataset, a binary classification method with synthetic minority over-sampling techniques (SMOTE) is proposed to effectively detect attack packets in IoT network traffic.
Аннотация
The thesis focuses on addressing the class imbalance problem in the Bot-IoT dataset, which contains imbalanced normal and attack packets due to the much larger number of attack packets compared to normal packets.
The key highlights and insights are:
Preprocessing: Feature selection is performed using random forest, mutual information, and chi-squared algorithms to tackle the curse of dimensionality. One-hot encoding is applied to categorical features and min-max normalization is used to improve classifier performance.
Data Sampling: The SMOTE algorithm is used to generate synthetic samples of the minority (normal) class to balance the dataset, ensuring an equal number of normal and attack packets for training.
Binary Classifiers: Several binary classifiers are investigated, including logistic regression, linear SVM, RBF kernel SVM, random forest, XGBoost, and multi-layer perceptron (MLP). These classifiers are trained on both the imbalanced and balanced datasets to evaluate the impact of class imbalance.
Performance Evaluation: The classifiers are evaluated using metrics such as accuracy, recall, precision, false positive rate (FPR), false negative rate (FNR), F1-score, and area under the ROC curve (AUC-score). Inference time is also measured.
Key Findings:
All classifiers achieve high accuracy, recall, and precision on the imbalanced dataset, but exhibit high FPR due to the skewed performance towards the majority (attack) class.
Classifiers trained on the balanced dataset using SMOTE show similar accuracy, recall, and precision, but significantly improve the FPR.
The inference time of linear and RBF kernel SVM increases on the balanced dataset due to the larger training size, while other classifiers maintain similar inference times.
The proposed method effectively addresses the class imbalance problem in the Bot-IoT dataset, enabling accurate detection of both normal and attack packets in IoT network traffic.
Статистика
The Bot-IoT dataset contains over 72 million records, with 43 features per record and a class label indicating normal (0) or attack (1) traffic.
The dataset has a highly imbalanced class distribution, with 477 normal packets and 3,668,041 attack packets.
Цитаты
"To address the class imbalance problem in the Bot-IoT dataset, we propose a binary classification method with synthetic minority over-sampling techniques (SMOTE)."
"The proposed classifier aims at detecting the attack packets and overcoming the class imbalance problem with the help of SMOTE algorithm."