Core Concepts
The authors propose an efficient and effective representation of node positions in graphs using a small number of representative nodes called landmarks, which are selected based on degree centrality. They provide theoretical analysis on the achievable accuracy of distance estimates via landmarks for well-known random graph models, and leverage these insights to develop the Hierarchical Position embedding with Landmarks and Clustering (HPLC) algorithm for link prediction.
Abstract
The paper focuses on learning positional information of nodes in a graph, which is important for link prediction tasks. The authors propose a representation of positional information using a small number of representative nodes called landmarks. The landmarks are selected based on their high degree centrality, which is motivated by network science theory.
The authors provide a theoretical analysis on the achievable accuracy of distance estimates via landmarks for well-known random graph models, such as Erdős-Rényi (ER) and Barabási-Albert (BA) models. For ER graphs, they show that the detour via landmarks incurs a constant factor overhead compared to the shortest path distance. For BA graphs, they prove that the strategy of choosing high-degree nodes as landmarks is asymptotically optimal, i.e., the minimum detour distance is asymptotically equal to the shortest path distance.
Motivated by the theoretical insights, the authors propose the Hierarchical Position embedding with Landmarks and Clustering (HPLC) algorithm. HPLC partitions the graph into clusters, selects the node with the highest degree in each cluster as a landmark, and computes the encoding based on the distances to landmarks and the hierarchical grouping of clusters. The computation of HPLC can be mainly done during preprocessing, incurring low computational costs.
The experiments on 7 datasets with 16 baseline methods show that HPLC achieves state-of-the-art performances in link prediction, demonstrating the effectiveness of the proposed landmark-based positional embedding approach.
Stats
The paper does not provide any specific numerical data or statistics. It focuses on the theoretical analysis of distance estimates via landmarks and the description of the HPLC algorithm.