Основные понятия
Existing audio deepfake detection models struggle to generalize across diverse datasets and against advanced text-to-speech (TTS) models, highlighting the need for more robust detection methods and comprehensive benchmarks like SONAR.
Статистика
Wave2Vec2BERT achieves an average accuracy of 0.8989 on the SONAR dataset.
Wave2Vec2BERT achieves accuracies of 1.0, 0.9062, 0.9474, 0.9712, 0.9237, 0.97, and 0.9867 on PromptTTS2, VALL-E, VoiceBox, FalshSpeech, AudioGen, and xTTS, respectively.
Wave2Vec2BERT only reaches 0.6017 accuracy on Seed-TTS and 0.7833 on OpenAI.
Whisper-large achieves an accuracy of 95.72% and an AUROC of 0.9901 on LibriSeVoc.
Whisper-large outperforms Whisper-tiny by 38.48% in accuracy on the In-the-wild dataset.