核心概念
Using conjugate parameters for complex numbers employed in knowledge graph embedding models can improve memory efficiency by 2x in relation embedding while achieving comparable performance to state-of-the-art non-conjugate models, with faster or at least comparable training time.
摘要
The content discusses a parameter-sharing method for complex numbers employed in Knowledge Graph Embedding (KGE) models. The key points are:
-
KGE models represented using complex numbers have state-of-the-art performance, but demand high memory costs. To address this, the authors propose a parameter-sharing method that uses conjugate parameters in the transformation functions.
-
By using conjugate parameters, the authors' method can reduce the space complexity of relation embedding from O(nede + nrdr) to O(nede + nrdr/2), effectively halving the relation embedding size.
-
The authors demonstrate their method on two best-performing KGE models, ComplEx and 5⋆E, across five benchmark datasets. The results show that the conjugate models (Complϵx and 5⋆ϵ) achieve comparable accuracy to the original models, while reducing training time by 31% on average for 5⋆E.
-
Ablation studies confirm that the conjugate models retain the expressiveness of the original models, and that the parameter-sharing approach is more effective than simply reducing the number of parameters in the regularization process.
-
The authors conclude that their conjugate parameter-sharing method can help scale up KGs with less computational resources, while maintaining state-of-the-art performance.
统计
The theoretical space complexity of KGE models is often O(nede + nrdr), which is proportional to the number of KG elements (entities ne and relations nr) and embedding dimension (de and dr).
Using the ComplEx model on the FB15K dataset with ne = 14,951, nr = 1,345, and de = dr = 4,000 results in a parameter size of 65,184,000, requiring around 497 MB of memory.
The authors' conjugate models, Complϵx and 5⋆ϵ, can reduce the relation embedding size by half compared to the original models.
引用
"Scaling a KG is problematic as ne, nr can go up to millions; also because KGE models are often shallow machine learning models composed of simple operations, e.g., matrix multiplication."
"Inspired by the improved performance of complex number representation and Non-Euclidean models where transformation parameters attempt to interact rather than be independent, we intuited the idea of sharing parameters for memory efficiency."
"By using our method, models can reduce their space complexity to O(nede + nrdr/2), which means the relation embedding size is half the original model."