toplogo
Sign In

Leveraging One-class Homophily for Unsupervised Graph Anomaly Detection


Core Concepts
Normal nodes tend to have strong connection/affinity with each other, while the homophily in abnormal nodes is significantly weaker. Leveraging this one-class homophily property, we introduce a novel unsupervised anomaly scoring measure, local node affinity, and propose Truncated Affinity Maximization (TAM) to learn tailored node representations that maximize the local affinity of normal nodes for accurate graph anomaly detection.
Abstract
The paper reveals an important anomaly-discriminative property, the one-class homophily, in graph anomaly detection (GAD) datasets. Normal nodes tend to have strong connection/affinity with each other, while the homophily in abnormal nodes is significantly weaker. To exploit this property, the paper introduces a novel unsupervised anomaly scoring measure, local node affinity, that assigns a larger anomaly score to nodes that are less affiliated with their neighbors. However, it is challenging to capture this one-class homophily in the raw attribute space or generic node representation space, as the presence of non-homophily edges (edges connecting normal and abnormal nodes) can bias the node representations. To address this issue, the paper proposes Truncated Affinity Maximization (TAM), which learns tailored node representations that maximize the local affinity of normal nodes. TAM consists of two key components: Local Affinity Maximization Networks (LAMNet): LAMNet learns a GNN-based mapping function that maximizes the affinity of nodes with homophily relations to their neighbors, while keeping the affinity of nodes with non-homophily edges weak. Normal Structure-preserved Graph Truncation (NSGT): NSGT iteratively removes non-homophily edges while preserving the homophily graph structure, to mitigate the bias caused by non-homophily edges in the message passing of LAMNet. The learned node representations in TAM result in significantly stronger local affinity for normal nodes than abnormal nodes, enabling accurate graph anomaly detection. Extensive experiments on 10 real-world GAD datasets show that TAM substantially outperforms seven competing models, achieving over 10% increase in AUROC/AUPRC compared to the best contenders on challenging datasets. TAM also demonstrates robustness to camouflaged attributes and efficiency on large-scale graphs.
Stats
The paper reports the following key statistics: The Euclidean distance between normal-normal (homophily) edges is substantially smaller than the distance between normal-abnormal (non-homophily) edges on the BlogCatalog and Amazon datasets. The homophily of normal nodes increases, while the number of non-homophily edges decreases, with increasing truncation iterations/depths in the NSGT component.
Quotes
"We, for the first time, empirically reveal the one-class homophily phenomenon that provides an anomaly-discriminative property for GAD." "Motivated by the one-class homophily property, we introduce a novel unsupervised anomaly scoring measure, local node affinity, that assigns a larger anomaly score to nodes that are less affiliated with their neighbors." "TAM makes full use of the one-class homophily to learn expressive normal representations by maximizing local node affinity on truncated graphs, offering discriminative local affinity scores for accurate GAD."

Key Insights Distilled From

by Hezhe Qiao,G... at arxiv.org 04-05-2024

https://arxiv.org/pdf/2306.00006.pdf
Truncated Affinity Maximization

Deeper Inquiries

How can the proposed one-class homophily property be leveraged to enhance other existing graph anomaly detection methods beyond TAM

The one-class homophily property uncovered in the context of graph anomaly detection can be a valuable asset in enhancing existing methods beyond TAM. By incorporating this property into other GAD approaches, researchers can potentially improve the detection accuracy and robustness of these methods. One way to leverage this property is to integrate it into the loss functions or objective functions of existing models. For instance, the anomaly scoring measure based on local node affinity can be used as an additional feature or input to traditional anomaly detection algorithms. This can provide a new perspective on anomaly detection by considering the strength of connections between nodes within the same class. Furthermore, the one-class homophily property can be utilized to refine the node representations learned by graph neural networks (GNNs) in other GAD models. By optimizing the node representations to maximize the local affinity of nodes with similar attributes, GNNs can better capture the underlying patterns in the data and improve anomaly detection performance. Additionally, the one-class homophily property can guide the design of new loss functions or regularization techniques that focus on preserving homophily relations while minimizing the impact of non-homophily edges. In summary, integrating the one-class homophily property into existing graph anomaly detection methods can lead to more effective and accurate anomaly detection by leveraging the inherent structure of the data to distinguish between normal and abnormal nodes.

What are the potential limitations of the one-class homophily assumption, and how can TAM be adapted to handle datasets with strong heterophily relations or very large graphs

The one-class homophily assumption, while beneficial for many datasets, may have limitations in scenarios where strong heterophily relations or very large graphs are present. One potential limitation is that the one-class homophily property may not hold true for datasets with diverse or mixed communities where nodes exhibit strong connections to nodes with different attributes or characteristics. In such cases, the one-class homophily assumption may not accurately capture the underlying patterns of normal and abnormal nodes, leading to suboptimal anomaly detection performance. To adapt TAM for datasets with strong heterophily relations or very large graphs, several strategies can be considered. One approach is to incorporate additional features or attributes that capture the heterophily relations in the data. By including information about diverse communities or subgroups within the graph, TAM can better differentiate between normal and abnormal nodes with varying characteristics. Another adaptation could involve modifying the NSGT truncation approach to selectively preserve edges that represent genuine homophily relations while removing non-homophily edges. This could be achieved by incorporating a more sophisticated edge removal criterion that considers the strength of homophily connections and the impact of removing specific edges on the overall graph structure. By refining the truncation process to better preserve homophily relations, TAM can be adapted to handle datasets with strong heterophily or very large graphs more effectively.

Can the truncation approach in NSGT be further improved to better preserve the genuine homophily structure while removing non-homophily edges

The truncation approach in NSGT can be further improved to better preserve the genuine homophily structure while removing non-homophily edges by incorporating more advanced edge removal strategies. One potential enhancement is to introduce a dynamic edge removal mechanism that considers the local affinity scores of nodes when deciding which edges to truncate. By prioritizing the removal of edges with low local affinity scores, NSGT can focus on preserving the homophily relations that are crucial for accurate anomaly detection. Additionally, incorporating a feedback mechanism that iteratively refines the edge removal process based on the impact on the overall graph structure can help NSGT better balance between preserving homophily and removing non-homophily edges. This iterative refinement can involve reevaluating the edge removal decisions based on the changes in local affinity scores and adjusting the truncation process accordingly. Furthermore, exploring adaptive edge removal strategies that take into account the heterogeneity of the graph data and the distribution of homophily and non-homophily edges can enhance the effectiveness of NSGT. By dynamically adjusting the edge removal criteria based on the local characteristics of the graph, NSGT can optimize the graph truncation process to better preserve the genuine homophily structure while removing non-homophily edges.
0
visual_icon
generate_icon
translate_icon
scholar_search_icon
star