Alapfogalmak
Integrating both attribute and directed structural information enhances the accuracy of multi-view clustering, as demonstrated by the novel AAS algorithm.
Kivonat
Bibliographic Information:
Li, X., & Zhang, X.-D. (2024). Multi-view clustering integrating anchor attribute and structural information. Neurocomputing. preprint submitted to Neurocomputing. arXiv:2410.21711v1 [cs.LG]
Research Objective:
This paper proposes a novel multi-view clustering algorithm, called AAS, that leverages both attribute and directed structural information to improve clustering accuracy in directed networks.
Methodology:
The AAS algorithm utilizes a two-step proximity approach using anchors in each view. First, an attribute similarity matrix is constructed, enhancing the similarity between data nodes and their class-matching anchors. Then, a structural similarity matrix is built based on strongly connected components, increasing similarity among anchors within the same component. These matrices are integrated into a unified optimization framework with the NESE clustering algorithm to determine final clusters.
Key Findings:
- AAS outperforms seven other multi-view clustering algorithms (K-means, NESE, GMC, LMVSC, SMC, OMSC, CAMVC) on synthetic datasets, demonstrating significant improvements in clustering accuracy (ACC, NMI, Purity).
- Ablation studies confirm that integrating directed structural information significantly enhances clustering accuracy compared to using attribute information alone.
- The proposed anchor selection strategy, based on directed structural information, generally improves clustering performance compared to random anchor selection.
Main Conclusions:
Integrating both attribute and directed structural information is crucial for accurate multi-view clustering in directed networks. The AAS algorithm effectively leverages this information, leading to superior performance compared to existing methods.
Significance:
This research highlights the importance of incorporating structural information in multi-view clustering, particularly for directed networks, and provides a novel algorithm, AAS, that effectively addresses this challenge.
Limitations and Future Research:
- AAS relies on specific directed network structures, limiting its applicability to other data types.
- Integrating structural information increases computational cost, potentially hindering scalability for massive datasets.
- Future research could explore alternative methods for integrating structural information and optimize AAS for improved efficiency.
Statisztikák
The study uses two synthetic datasets, "Attribute SBM 50" and "Attribute SBM 5000", each with 3 views and 4 clusters.
"Attribute SBM 50" contains 50 nodes, while "Attribute SBM 5000" contains 5000 nodes.
The real-world dataset "Seventh graders" includes 29 students and their friendship networks across three views.
The study compares AAS with seven other algorithms: K-means, NESE, GMC, LMVSC, SMC, OMSC, and CAMVC.
Three clustering performance metrics are used: ACC (Clustering Accuracy), NMI (Normalized Mutual Information), and Purity.
The AAS algorithm demonstrates superior performance across all three metrics compared to the baseline algorithms.