Core Concepts
The choice of domain-specific embeddings significantly impacts the correlation between Fréchet Audio Distance (FAD) scores and human perceptual ratings of environmental sounds.
Abstract
Directory:
Authors and Affiliations
Abstract
Introduction
Related Work
Embeddings
Experiments
Results
Conclusion
1. Authors and Affiliations:
Authors from various institutions in France, South Korea, the US, and Japan.
Investigate the correlation between FAD and human perception of environmental sounds.
2. Abstract:
Explores the impact of alternative embeddings on FAD correlation with perceptual ratings.
Used various embeddings tailored for music or environmental sound evaluation.
PANNs-WGM-LogMel showed the best correlation with perceptual ratings.
3. Introduction:
Generative audio synthesis evaluated based on perceptual features.
FAD widely used for audio quality assessment.
Study aims to improve FAD validity by considering different embeddings.
4. Related Work:
FAD proposed for audio quality assessment.
Embeddings like VGGish and CLAP explored for music generation.
Importance of embedding choice highlighted for accurate evaluation.
5. Embeddings:
Description of various embeddings like VGGish, MERT, PANNs, MS-CLAP, and L-CLAP.
Different embeddings trained on music or environmental audio data.
Selection based on domain-specific relevance.
6. Experiments:
Used DCASE Task 7 dataset for evaluation.
Perceptual data collected for audio quality and category fit.
Spearman correlation analysis conducted for different embeddings.
7. Results:
PANNs-WGM-LogMel and MS-CLAP showed high correlations with perceptual ratings.
VGGish and MERT demonstrated weak correlations.
Embeddings' performance varied across different sound categories.
8. Conclusion:
Dependency of FAD metric on embedding choice.
Specialized embeddings crucial for FAD relevance.
Further research recommended for diverse category evaluation.
Stats
"The FAD scores were calculated for sounds from the DCASE 2023 Task 7 dataset."
"PANNs-WGM-LogMel produces the best correlation between FAD scores and perceptual ratings."
"VGGish, the embedding used for the original Fréchet calculation, yielded a correlation below 0.1."
Quotes
"The FAD calculation compares the two datasets in terms of fit to domain with the comparison of means."
"A low FAD score indicates that the two datasets contain similar sound sources and a similar diversity."
"The choice of the embedding is a crucial part of FAD metric design."