Grunnleggende konsepter
Introducing a novel pipeline, LLMTTT, that leverages the annotation capabilities of Large Language Models (LLMs) to enhance test-time training and alleviate the out-of-distribution (OOD) problem on graphs.
Sammendrag
The paper proposes a novel pipeline called LLMTTT that leverages the annotation capabilities of Large Language Models (LLMs) to enhance test-time training and address the out-of-distribution (OOD) problem on graphs.
Key highlights:
- LLMTTT introduces a hybrid active node selection strategy that considers node diversity, representativeness, and the prediction signals from the pre-trained model to select the most valuable nodes for annotation by LLMs.
- LLMTTT designs a two-stage training strategy to effectively adapt the pre-trained model under the noisy and limited labels provided by LLMs.
- Extensive experiments and theoretical analysis demonstrate the effectiveness of LLMTTT in improving the performance on various OOD graph datasets compared to existing methods.
The paper first provides an overview of the LLMTTT pipeline, which consists of a pre-training phase, a fully test-time training phase, and an inference phase. The key components of the test-time training phase are then detailed:
- Hybrid active node selection: Combines uncertainty-based and distribution-based active learning to select the most valuable nodes for annotation.
- Confidence-aware high-quality annotation: Leverages prompting strategies and confidence scores from LLMs to obtain high-quality pseudo labels.
- Two-stage training: Includes training with filtered nodes to reduce the impact of noisy labels, and self-training with unlabeled nodes to further leverage the information from the test data.
Theoretical analysis is provided to demonstrate that incorporating labeled test samples during the test-time training phase can significantly improve the overall performance across the test domain compared to traditional test-time training methods.
Statistikk
"Graph is a kind of prevalent multi-modal data, consisting of modalities of both the topological structure and node features."
"Text-Attributed Graphs (TAGs) are graphs of which node attributes are described from the text modality, such as paper citation graphs containing paper descriptions and social network data including user descriptions."
Sitater
"Graph Neural Networks (GNNs) have demonstrated great power in graph representation learning, and have achieved revolutionary progress in various graph-related applications, such as social network analysis [16], recommendation [39, 64] and drug discovery [8, 15]."
"Despite remarkable achievements, GNNs have shown vulnerability in Out-Of-Distribution (OOD) generalization, as it is observed that GNNs can confront significant performance decline when there exists distribution shift between the training phase and the test phase [19, 33]."