Główne pojęcia
Effective sub-sampling of large graphs is crucial for applying divide-and-conquer algorithms to network analysis tasks. Different sub-sampling routines can have a significant impact on the performance of these algorithms.
Streszczenie
The paper presents a thorough comparison of seven graph sub-sampling algorithms and their impact on divide-and-conquer algorithms for community structure and core-periphery (CP) structure detection.
Key highlights:
- The authors derive theoretical results for the mis-classification rate of the divide-and-conquer algorithm for CP structure under various sub-sampling schemes.
- Extensive experiments on simulated and real-world data show that the optimal sub-sampling method depends on the specific task.
- For community detection, random node sampling performs the best.
- For CP structure, sub-sampling routines that favor core nodes, such as edge sampling and random walk, consistently outperform other methods.
- The varying performance of the sub-sampling algorithms underscores the importance of carefully selecting the sub-sampling routine for the specific application.
Statystyki
The largest community size in the Political Blogs network is 53% of the total nodes.
The modularity of the community structure in the Political Blogs network ranges from 0.075 to 0.425.
The core size in the Airport network ranges from 29 to 35 nodes out of 755 total nodes.
The core-periphery metric (BE) for the Airport network ranges from 0.233 to 0.236.
The core size in the Twitch network ranges from 88 to 275 nodes out of 168,113 total nodes.
The core-periphery metric (BE) for the Twitch network ranges from 0.004 to 0.079.