Prompt tuning, a parameter-efficient fine-tuning method, effectively adapts pre-trained audio deepfake detection models to new domains with limited data, addressing challenges of domain gaps, data scarcity, and computational costs.
Current AI-based audio deepfake detection methods, while effective in controlled settings, often fail to generalize to real-world scenarios and lack the transparency needed to foster trust with users. This paper introduces a novel benchmark for evaluating the generalizability of these methods and explores explainability techniques to bridge the gap between performance and user understanding.
Existing audio deepfake detection models struggle to generalize across diverse datasets and against advanced text-to-speech (TTS) models, highlighting the need for more robust detection methods and comprehensive benchmarks like SONAR.
The Diffusion and Flow-Matching Based Audio Deepfake (DFADD) dataset provides a comprehensive collection of spoofed audio generated by state-of-the-art diffusion and flow-matching text-to-speech (TTS) models, enabling the development of more robust anti-spoofing detection models.
This paper proposes a novel framework for audio deepfake detection that achieves high accuracy on available fake data and effectively performs continuous learning on new fake data using few-shot learning.
A neural collapse-based sampling approach to create a new training database from diverse datasets, enabling computationally efficient and generalized audio deepfake detection models.
This paper presents a novel cross-domain audio deepfake detection (CD-ADD) dataset comprising over 300 hours of speech data generated by five advanced zero-shot text-to-speech (TTS) models. The dataset is designed to simulate real-world scenarios and evaluate the generalization capabilities of deepfake detection models.
The author introduces the MLAAD dataset to address the limitations of existing anti-spoofing databases by providing a multilingual and diverse resource for training deepfake detection models.