Core Concepts
The maximum capability of a topological feature in link prediction follows a simple yet theoretically validated expression, which only depends on the extent to which the feature is held in missing and nonexistent links.
Abstract
The paper aims to unveil the maximum capability of a topological feature in link prediction. It introduces a theoretical framework that is compatible with different indexes to gauge the feature, different prediction approaches to utilize the feature, and different metrics to quantify the prediction performance.
The key findings are:
The maximum capability of a topological feature follows a simple yet theoretically validated expression, which only depends on the extent to which the feature is held in missing and nonexistent links. This means that a family of indexes based on the same feature shares the same upper bound of prediction accuracy.
The potential of all other indexes can be readily estimated through one single measurement of the feature's prevalence in the positive and negative samples.
The supervised prediction in principle gives a more accurate result compared with the unsupervised one. The maximum capability of the topological feature is lifted by utilizing the supervised method, which can be mathematically quantified.
Using the common neighbor feature as an example, the paper shows how the interplay of different structural characteristics, such as the number of closed and open triangles, determines the prediction performance in different networks, which cannot be fully explained by the clustering coefficient alone.
The results are verified by 550 structurally diverse networks, demonstrating the universality of the pattern uncovered. The findings have applications in feature and method selection, and shed light on network characteristics that make a topological feature effective in link prediction.
Stats
The number of closed triangles (N△) and open triangles (N∧) in the network.
The number of times a link is shared by multiple triangles (S△) and the number of times an unconnected node pair is shared by other open triangles (S∧).
Quotes
"The maximum capability of a topological feature follows a simple yet theoretically validated expression, which only depends on the extent to which the feature is held in missing and nonexistent links."
"The potential of all other indexes can be readily estimated through one single measurement of the feature's prevalence in the positive and negative samples."
"The supervised prediction in principle gives a more accurate result compared with the unsupervised one. The maximum capability of the topological feature is lifted by utilizing the supervised method, which can be mathematically quantified."