The paper presents a two-step approach called Sieve for identifying clusters in large sparse networks.
In the first step, the method uses a breadth-first search (BFS) algorithm to efficiently divide the network into disjoint components that are completely disconnected from one another. This step ensures that no true clusters are split during the partitioning process.
In the second step, the method optimizes a novel objective function, S, to identify clusters within each of the disconnected components. The S function quantifies the quality of clustering by measuring the difference between the observed number of intra-cluster edges and the expected number of intra-cluster edges in a random component with the same density. This approach avoids biases against singleton and doubleton clusters that are present in the commonly used modularity (Q) function.
The authors demonstrate that the Sieve method consistently outperforms modularity-based approaches in identifying clusters, especially for networks with high levels of noise. Experiments on synthetic networks, benchmark instances, and two large biological networks show that Sieve can accurately uncover complex community structures that modularity fails to detect due to its resolution limit.
The key highlights of the Sieve method are:
לשפה אחרת
מתוכן המקור
arxiv.org
שאלות מעמיקות