核心概念
Restricting the feature space of AI-generated text detectors by removing specific components from text embeddings, such as attention heads or embedding coordinates, can significantly improve their robustness and ability to generalize to unseen domains and generation models.
統計資料
Removing the first layer of RoBERTa improved average cross-domain accuracy by 3% on SemEval.
Pruning layers 3 and 4 in RoBERTa showed more stable and beneficial results for cross-domain and cross-model settings.
Erasing the TopConst concept improved cross-domain transfer accuracy by up to 13% on SemEval, particularly for transfers from Wikipedia and arXiv.
Erasing the WC concept led to the most significant cross-domain improvement on SemEval, indicating that word semantics contribute to domain-specific spurious features.
Head selection based on a lay-off validation set with samples from all generators and domains in GPT-3D achieved the best scores among all methods.