Centrala begrepp
Subgraph2vec is a random walk-based algorithm that embeds knowledge graphs by allowing users to define arbitrary schema subgraphs, providing a more flexible and generic approach compared to previous methods.
Sammanfattning
The paper introduces Subgraph2vec, a random walk-based algorithm for embedding knowledge graphs. Unlike previous methods like node2vec, metapath2vec, and regpattern2vec, which rely on predefined patterns or biases, Subgraph2vec allows users to define an arbitrary schema subgraph that guides the random walks.
The key aspects of the Subgraph2vec approach are:
- The user specifies a schema subgraph by providing a set of edge IDs that define the subgraph of interest within the larger knowledge graph.
- The algorithm then performs random walks within this user-defined subgraph, choosing the next edge to follow based on the probability distribution of edge types connected to the current node.
- The resulting walk sequences are then used as input to a modified skip-gram model to learn the node embeddings.
- The authors evaluate the Subgraph2vec embeddings on the task of link prediction, comparing against previous methods on the YAGO and NELL datasets. The results show that Subgraph2vec outperforms the other approaches in most cases.
The key advantage of Subgraph2vec is its flexibility, as it allows users to guide the embedding process by specifying the relevant subgraph of interest, rather than relying on predefined patterns. This makes the algorithm more generic and applicable to a wider range of knowledge graph scenarios.
Statistik
The YAGO dataset contains 123,182 unique entities and 1,084,040 unique edges with 37 different relation types.
The NELL dataset contains 49,869 unique nodes, 296,013 edges, and 827 relation types.
Citat
"In our algorithm, however, we define a method in which the algorithm runs on any arbitrary random walk path inside a user-defined schema subgraph based on edges."
"The advantage of using a subgraph is that it is more permissive; since we can run the walks totally randomly inside the user-defined subgraph rather than having biased walks based on a rigid pattern like the previous mentioned methods."