The paper addresses the challenge of generalization in audio deepfake detection models. It proposes a neural collapse-based sampling approach to create a new training database from diverse datasets, which can improve the generalization capability of audio deepfake detection models.
The key highlights are:
Audio deepfake detection models trained on specific datasets often struggle to generalize to unseen data distributions, due to the high within-class variability in the fake audio class.
The authors leverage the neural collapse theory to formulate a sampling approach that identifies representative real and fake audio samples from diverse datasets, based on the geometric representations of the penultimate embedding of a pre-trained deepfake classifier.
Experiments using the ASVspoof 2019 LA, FoR, and Wavefake datasets demonstrate that the proposed approach can achieve comparable generalization performance on unseen data, such as the In-the-wild dataset, while being computationally efficient and requiring less training data compared to existing methods.
The authors also propose a modified sampling algorithm specifically for the fake class, which involves k-means clustering to address the within-class variability issue.
The proposed methodology has the potential to enhance the generalization of audio deepfake detection models across diverse data distributions, while reducing the computational burden associated with training on large datasets.
Egy másik nyelvre
a forrásanyagból
arxiv.org
Mélyebb kérdések