Core Concepts
Equilibrium K-Means (EKM) is a novel fuzzy clustering algorithm that is robust to imbalanced data by preventing centroids from crowding together in the center of large clusters.
Abstract
The content introduces a novel clustering algorithm called Equilibrium K-Means (EKM) that is designed to handle imbalanced data.
Key highlights:
- Traditional clustering algorithms like Hard K-Means (HKM) and Fuzzy K-Means (FKM) suffer from the "uniform effect" where they tend to generate clusters of similar sizes, even when the underlying data has highly varying group sizes.
- EKM addresses this issue by introducing repulsive forces between centroids, preventing them from crowding together in the center of large clusters.
- EKM is a fuzzy clustering algorithm with a clear physical interpretation - it aims to minimize the expected energy of the data points under a Boltzmann distribution.
- EKM has the same time and space complexity as FKM, making it scalable to large datasets.
- Experiments on synthetic and real-world datasets show that EKM outperforms other centroid-based algorithms, including HKM, FKM, and variations designed for imbalanced data, on imbalanced datasets.
- EKM can also be effectively combined with deep neural networks for deep clustering of imbalanced data, outperforming the use of HKM in this setting.
Stats
EKM has the same time complexity O(NK) and space complexity as FKM.
EKM has a batch-learning version that can be applied to large datasets.
Quotes
"EKM belongs to the family of fuzzy clustering and membership defined in EKM has a clear physical meaning. Repulsive forces appear among centroids of EKM, successfully reducing the uniform effect by preventing centroids from crowding together in a large cluster."
"When tested on an imbalanced dataset derived from MNIST, joint learning of DNNs and EKM improves clustering accuracy by 35% compared to joint learning of DNNs and HKM."