Core Concepts
A federated multi-label feature selection method that leverages mutual information and correlation distance to identify relevant and non-redundant features across distributed multi-label datasets in IoT environments.
Abstract
The proposed FMLFS method addresses the challenges of high dimensionality and the presence of noisy, redundant, or irrelevant features in multi-label datasets generated by IoT devices. It introduces a federated approach to multi-label feature selection, where mutual information between features and labels is used as the relevancy metric, and the correlation distance between features, derived from mutual information and joint entropy, is utilized as the redundancy measure.
The FMLFS algorithm comprises two phases:
- Local phase: Each client computes the mutual information and correlation distance measures for their local dataset and sends them to the edge server.
- Global phase: The edge server aggregates the received metrics, transforms the multi-label feature selection problem into a bi-objective optimization problem, and employs Pareto-based dominance and crowding distance strategies to rank the features. The ranked features are then sent back to the clients.
The proposed method is evaluated in two scenarios: 1) transmitting reduced-size datasets to the edge server for centralized classifier usage, and 2) employing federated learning with reduced-size datasets. The results demonstrate that FMLFS outperforms five other comparable methods in the literature and provides a good trade-off between performance, time complexity, and communication cost on three real-world multi-label datasets.
Stats
The number of instances, features, and labels in the Yeast, Scene, and Birds datasets are 2417, 103, 14; 2407, 294, 6; and 645, 260, 19, respectively.
Quotes
"The presence of noisy, redundant, or irrelevant features in these datasets, along with the curse of dimensionality, poses challenges for multi-label classifiers."
"Feature selection (FS) proves to be an effective strategy in enhancing classifier performance and addressing these challenges."
"There is currently no existing distributed multi-label FS method documented in the literature that is suitable for distributed multi-label datasets within IoT environments."