Comprehensive Benchmark for Evaluating Audio Representation Learning Models Across Speech, Music, and Acoustic Events Domains
A comprehensive benchmark, ARCH, is introduced to systematically evaluate audio representation learning models across diverse domains including speech, music, and acoustic events. The benchmark enables standardized comparison of state-of-the-art self-supervised learning models and provides insights into their generalization capabilities.