Konsep Inti
The paper introduces a novel metric called Maximal Sum of Inner Degrees Squared (MSIDS) and proposes an improved graph partitioning algorithm called DBH-X that balances the replication factor and MSIDS, leading to better performance for distributed graph processing algorithms.
Abstrak
The paper examines the graph partitioning problem and introduces a new metric called Maximal Sum of Inner Degrees Squared (MSIDS). It establishes the connection between MSIDS and the replication factor (RF), which has been the main focus of theoretical work in this field.
The paper proposes a new partition algorithm called DBH-X, which is an extension of the existing Degree-Based Hashing (DBH) partitioner. DBH-X introduces two key techniques:
Threshold (τ): It partitions edges based on the source vertex degree, using the vertex hash function for low-degree vertices and the DBH method for high-degree vertices.
Spread: It distributes edges among multiple sets of partitions to control the spread of edges and prevent high inner degrees within partitions.
The theoretical analysis shows that minimizing one metric, such as the replication factor, can lead to an increase in another metric, such as MSIDS. The authors demonstrate that DBH-X can strike a balance between these two metrics, leading to improved performance compared to the baseline DBH algorithm.
The experimental results on various graph datasets show that DBH-X significantly improves both the replication factor and MSIDS compared to the baseline DBH algorithm. The authors also provide test results that show the runtime acceleration of GraphX-based PageRank and Label Propagation algorithms using the DBH-X partitioning method.
Statistik
The number of vertices (|V|) and edges (|E|) for the graph datasets used in the experiments:
uk-2002: |V| = 18,520,486, |E| = 298,113,762
graph500-22: |V| = 2,396,657, |E| = 64,155,735
graph500-24: |V| = 8,870,942, |E| = 260,379,520
graph500-25: |V| = 17,062,472, |E| = 523,602,831
graph500-26: |V| = 32,804,978, |E| = 1,051,922,823
Kutipan
"The goal is to receive target algorithms acceleration in such a way that end-to-end time including graph partition time will be minimal."
"High inner vertex degrees within a partition can be detrimental. In Chapter 5.3, we will analyze the relationship between the replication factor and the maximal sum of inner degrees squared."
"For a rather large part of the vertices, the expectation is greater than 0.5 · di, so it makes sense to use for this part a model with bi = 0.5 instead."