The paper studies the directed graph clustering problem through the lens of statistics. It formulates clustering as estimating the underlying communities in the directed stochastic block model (DSBM). The authors conduct the maximum likelihood estimation (MLE) on the DSBM and thereby ascertain the most probable community assignment given the observed graph structure.
In addition to the statistical point of view, the authors further establish the equivalence between this MLE formulation and a novel flow optimization heuristic, which jointly considers both edge density and edge orientation. Building on this new formulation of directed clustering, the authors introduce two efficient and interpretable directed clustering algorithms: a spectral clustering algorithm and a semidefinite programming based clustering algorithm.
The authors provide a theoretical upper bound on the number of misclustered vertices of the spectral clustering algorithm using tools from matrix perturbation theory. They compare, both quantitatively and qualitatively, their proposed algorithms with existing directed clustering methods on both synthetic and real-world data, thus providing further ground to their theoretical contributions.
他の言語に翻訳
原文コンテンツから
arxiv.org
深掘り質問